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Abstract 

We consider a two-user state-dependent multiaccess channel in which the states of the chan- 
nel are known non-causally to one of the encoders and only strictly causally to the other encoder. 
Both encoders transmit a common message and, in addition, the encoder that knows the states 
non-causally transmits an individual message. We find explicit characterizations of the capacity 
region of this communication model in both discrete memoryless and memoryless Gaussian 
cases. In particular the capacity region analysis demonstrates the utility of the knowledge of the 
states only strictly causally at the encoder that sends only the common message in general. More 
specifically, in the discrete memoryless setting we show that such a knowledge is beneficial and 
increases the capacity region in general. In the Gaussian setting, we show that such a knowledge 
does not help, and the capacity is same as if the states were completely unknown at the encoder 
that sends only the common message. Furthermore, we also study the special case in which the 
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two encoders transmit only the common message and show that the knowledge of the states 
only strictly causally at the encoder that sends only the common message is not beneficial in this 
case, in both discrete memoryless and memoryless Gaussian settings. The analysis also reveals 
optimal ways of exploiting the knowledge of the state only strictly causally at the encoder 
that sends only the common message when such a knowledge is beneficial. The encoders 
collaborate to convey to the decoder a lossy version of the state, in addition to transmitting the 
information messages through a generalized Gel'fand-Pinsker binning. Particularly important 
in this problem are the questions of 1) optimal ways of performing the state compression and 
2) whether or not the compression indices should be decoded uniquely. By developing two 
optimal coding schemes that perform this state compression differently, we show that when 
used as parts of appropriately tuned encoding and decoding processes, both compression a- 
la noisy network coding, i.e., with no binning, and compression using Wyner-Ziv binning 
are optimal. The scheme that uses Wyner-Ziv binning shares elements with Cover and El 
Gamal original compress-and-forward, but differs from it mainly in that backward decoding is 
employed instead of forward decoding and the compression indices are not decoded uniquely. 
Finally, by exploring the properties of our outer bound, we show that, although not required in 
general, the compression indices can in fact be decoded uniquely essentially without altering the 
capacity region, but at the expense of larger alphabets sizes for the auxiliary random variables. 



I. Introduction 

The study of channels that are controlled by random states has spurred much interest, due to its importance 
from both information-theoretic and communications aspects. For example, state-dependent channels may model 
communication in random fading environments [1] or in the presence of interference imposed by adjacent users. 
The channel states may be known in a strictly-causal, causal or noncausal manner, to all or only a subset of the 
encoders. For a transmission of length n, let S n = (Si, S2, . . . , S„) denote the state sequence, with S, representing the 
channel state affecting the channel at time or block i. For the transmission in block i, the state sequence is known 
non-causally if it is known entirely before the beginning of the transmission. It is known causally if it is known up 
to and including time i; and it is known strictly causally if it is known only up to time z — 1. The way the channel 
state information is utilized and influences capacity depends also on which of the encoders(s) and decoder(s) are 
aware of it. In single user channels, the concept of channel state available at only the transmitter dates back to 
Shannon [2] for the causal channel state case, and to Gel'fand and Pinsker [3] for the non-causal channel state case. 
In multiuser environments, a growing body of work studies multi-user state-dependent models. Recent advances 
in this regard can be found in [4]-[27], and many other works. For a comprehensive review of state-dependent 
channels and related work, the reader may refer to [4]. 

There is a connection between the role of states known strictly causally at an encoder and that of output 
feedback given to that encoder. In single-user channels, it is now well known that strictly causal feedback does not 
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increase the capacity [28] . In multiuser channels or networks, however, the situation changes drastically, and output 
feedback can be beneficial — but its role is still highly missunderstood. One has a similar picture with strictly causal 
states at the encoder. In single-user channels, independent and identically distributed states available only in a 
strictly causal manner at the encoder have no effect on the capacity. In multiuser channels or networks, however, 
like feedback, strictly causal states in general increase the capacity. 

Advances in the study of the effect of strictly causal states in multiuser channels are rather very recent and 
concern mainly multiple access scenarios. In [15], Lapidoth and Steinberg study a two-encoder multiple access 
channel with independent messages and states known causally at the encoders. They show that the strictly causal 
state sequence can be beneficial, in the sense that it increases the capacity for this model. This result is reminiscent 
of Dueck's proof [29] that feedback can increase the capacity region of some broadcast channels. In accordance 
with [29], the main idea of the achievability result in [15] is a block Markov coding scheme in which the two users 
collaborate to describe the state to the decoder by sending cooperatively a compressed version of it. As noticed in 
[15], although some non-zero rate that otherwise could be used to transmit pure information is spent in describing 
the state to the decoder, the net effect can be an increase in the capacity. In [16], they show that strictly causal 
state information is beneficial even if the channel is controlled by two independent states each known to one 
encoder strictly causally. In this case, each encoder can help the other encoder transmit at a higher rate by sending 
a compressed version of its state to the decoder. In [18], Li, Simeone and Yener improve the results of [15], [16] and 
extend them to the case of multiple encoders. The achievability results in [18] are inspired by the noisy network 
coding scheme of [30] and, unlike [15], [16], do not use Wyner-Ziv binning [31] for the compression of the state. In 
a very recent contribution [32], Lapidoth and Steinberg derive a new inner bound on the capacity region for the 
case of a single state governing the multiaccess channel. They also prove that the inner bound of [18] for the case 
of two independent states each known strictly causally to one encoder can indeed be strictly better than previous 
bounds in [15], [16] - a result which is conjectured previously by Li, Simeone and Yener in [18]. 

A. Studied Model 

In this paper, which generalizes a former conference version [33], we study a two-user state-dependent multiple 
access channel with the channel states known non-causally at one encoder and only strictly causally at the other 
encoder. The decoder is not aware of the channel states. As shown in Figure[T] both encoders transmit a common 
message and, in addition, the encoder that knows the states non-causally transmits an individual message. This 
model generalizes one whose capacity region is established in [5] and in which the encoder that sends only the 
common message does not know the states at all. More precisely, let W c and Wi denote the common message and the 
individual message to be transmitted in, say, n uses of the channel; and S" = (Si, ... , S„) denote the state sequence 
affecting the channel during this time. At time i, Encoder 1 knows the complete sequence S" = (Si, . . . , S,_i, S,, . . . , S„) 
and sends Xn = <fti(W c , Wi, S"), and Encoder 2 knows only S' -1 = (Si, . . . , S,_i) and sends X2, = <p2,i(V\f c , S ,_1 ) - the 
functions (pi and (p2j are some encoding functions. In this paper, we study the capacity region of this state-dependent 
MAC model. As our analysis will show, this requires, among others, understanding the role of the strictly causal 
part of the state that is revealed to Encoder 2. 
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Fig. 1. State-dependent MAC with degraded message sets and states known noncausally at the encoder that sends 
both messages and only strictly causally at the other encoder. 



B. Main Contributions 

In the discrete memoryless case, we characterize the capacity region for the general finite-alphabet case with a 
single-letter expression. The proof of the achievability part is based on a block-Markov coding scheme in which 
the two encoders collaborate to convey a lossy version of the state to the decoder, in the spirit of [15], [16], [32], 
in addition to a generalized Gel'fand-Pinsker binning for the transmission of the information messages [3]. From 
the angle of the state compression, coding schemes that perform the state compression for our model tie with very 
recent works on compressions in compress-and-forward type relaying networks [30], [34]-[36]. We first develop a 
coding scheme in which the state compression is performed a-la Kim et al. noisy network coding scheme and show 
that it is optimal, i.e., achieves an outer bound that we establish for the studied model. In this coding scheme, unlike 
[15], [16], [32] where every information message is divided into blocks and different submessages are sent over these 
blocks and then decoded one at a time using the same codebook as in the original compress-and-forward scheme by 
Cover and El Gamal [37], here the entire common message and the entire individual message are transmitted over 
all blocks using codebooks that are generated independently, one for each block, and the decoding is performed 
simultaneously using all blocks as in the noisy network coding scheme of [30]. Also, like [30], at each block the 
compression index of the state of the previous block is sent using standard rate distortion, not Wyner-Ziv binning. 
At the end of the transmission, the receiver uses the outputs of all blocks to perform simultaneous decoding of 
the information common and individual messages, without uniquely decoding the compression indices. From this 
angle, our coding scheme connects more with [18], than with [15], [16] and [32]. 

Two of the most important features of our coding scheme that is based on noisy network coding are i) standard 
compression without Wyner-Ziv binning and ii) non-explicit decoding of the compression indices. Investigating 
whether these features are pivotal for optimality in our problem, as argued in [30] for some related models, we also 
explore binning-based compressions. We show that the capacity region of our model can also be achieved using 
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an alternate coding scheme in which the state compression is realized using Wyner-Ziv binning. The employed 
optimal alternate coding scheme shares elements with Cover and El Gamal compress-and-forward [37], but differs 
from it in two aspects: 1) backward decoding is utilized instead of the forward decoding of [37], and 2) unlike [37], 
the compression indices are not decoded uniquely. Decoding backwardly instead of forwardly seems essential 
for the optimality of this alternate coding scheme here. At this level, we note that the finding in this paper that 
backward decoding with non-unique decoding of the compression indices is beneficial, may hold more generally 
in other scenarios that involve Wyner-Ziv binning. In the fading setting, this is also observed in [38]. Next, by 
exploring our outer bound further, we show that, although not required, one can modify this coding scheme in 
a manner to get the compression indices decoded at the receiver essentially without altering the capacity region 
but at the expense of larger alphabets sizes of the involved auxiliary random variables. The decoding of the 
compression indices introduces an additional rate constraint; but we show that this constraint is satisfied by the 
auxiliary random variables of the outer bound. Finally, we note that the finding in this paper that in the context 
of Wyner-Ziv binning backward decoding with non-unique decoding of the compression indices improves the 
transmission rate may be beneficial in other scenarios. In the fading setting, this was also observed in [38]. 

The single-letter characterization of the capacity region of our model remains intact if one allows feedback to 
the encoder that sends both messages. Also, the capacity region of our model contains that of the model of [5] 
in which the encoder that sends only the common message is unaware of the channel states; and this shows that 
revealing the states even only strictly causally to this encoder potentially increases the capacity region. Next, by 
investigating a discrete memoryless example, we show that this inclusion can be strict, thus demonstrating the 
utility of conveying a compressed version of the state to the decoder cooperatively by the encoders. 

We also specialize our results to the case in which the two encoders send only the common message. We refer 
to the capacity in this case as common-message capacity. We show that, when one of the two encoders is informed 
noncausally the knowledge of the states only strictly causally at the other encoder does not increase the common- 
message capacity. It should be noted that this result is not a direct consequence of that feedback does not increase 
the capacity in a multiaccess channel in which the encoders send only a common message; and our converse proof 
is needed here. 

Next, we consider the memoryless Gaussian setting in which the channel state and the noise are additive and 
Gaussian. We establish an operative outer bound on the achievable rate pairs. Then, we show that this outer bound 
is achievable, yielding a closed-form expression of the capacity region. The resulting capacity region coincides 
with that of the model of [5] in which the encoder that sends only the common message is completely unaware 
of the states, thus demonstrating that, by opposition to the discrete memoryless case, revealing the states strictly 
causally to this encoder is not beneficial in the Gaussian case, in the sense that it does not increase the capacity 
region. 

Finally, we note that in contrast to the related MAC models in [5], [7], our converse proofs in this paper do not 
follow directly from the converse part proof of the capacity formula for the standard Gel'fand-Pinsker channel [3]. 
This is because, at time i, the encoder that transmits only the common message sends inputs which are function of 
not only that message, but also the observed past state sequence. 
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C. Outline and Notation 

An outline of the remainder of this paper is as follows. Section [TT] describes in more detail the communication 
model that we consider in this work. Section|IIl]provides the capacity region of the discrete memoryless model. In 
this section we also establish an alternative outer bound on the capacity region that will turn to be useful in the 
Gaussian case, provide an example demonstrating the utility of revealing the states only strictly causally to the 
encoder that sends only the common message, and derive the common-message capacity. SectionlTvl characterizes 
the capacity region as well as the common-message capacity of the Gaussian model. Finally, Section IVl concludes 
the paper. 

We use the following notations throughout the paper. Upper case letters are used to denote random variables, 
e.g., X; lower case letters are used to denote realizations of random variables, e.g., x; and calligraphic letters 
designate alphabets, i.e., X. The probability distribution of a random variable X is denoted by PxM- Sometimes, 
for convenience, we write it as Fx- We use the notation Ex[-] to denote the expectation of random variable X. A 
probability distribution of a random variable Y given X is denoted by Py\x- The set of probability distributions 
defined on an alphabet X is denoted by IP(X). The cardinality of a set X is denoted by |X|. For convenience, the 
length n vector x" will occasionally be denoted in boldface notation x. The Gaussian distribution with mean fi and 
variance a 2 is denoted by a 2 ). For integers i < j, we define [i : j] :- [i, i + Y, ...,]}. Finally, throughout the paper, 
logarithms are taken to base 2, and the complement to unity of a scalar u € [0, 1] is denoted by u, i.e., u = 1 — u. 

II. System Model and Definitions 

We consider a stationary memoryless state-dependent MAC VJy\x lr Xx,S whose output Y € y is controlled by the 
channel inputs Xj 6 Xi and X2 G X2 from the encoders and the channel state S € S which is drawn according to 
a memoryless probability law Q$. We assume that the channel state S" is known non-causally at Encoder 1, i.e., 
beforehand, at the beginning of the transmission block. Encoder 2 knows the channel states only strictly-causally; 
that is, at time it knows the states only up to time i — 1, S ,_1 = (Si, . . . , S;_i). 

Encoder 2 wants to send a common message W c and Encoder 1 wants to send an independent individual message 
Wi along with the common message W c . We assume that the common message W c and the individual message 
Wi are independent random variables drawn uniformly from the sets W c — {1, • • • ,M C ] and Wi = {1, • • • ,Mij, 
respectively. The sequences X" and X" from the encoders are sent across a state-dependent multiple access channel 
modeled as a memoryless conditional probability distribution Wy|Xj,x 2 ,s- The joint probability mass function on 
W c xW a xS"xX^xX^X^" is given by 

n 

P(w Cl w x ,s n , x\, x" 2 , f) = P{w c )P{w x ) Yl Qs(s l )P(x u \w c , wu s")P(x 2rl \w c , s- 1 ) 

1=1 

•Wy| Xl ,x 2 ,s(l/i|Xl,i, X 2l i, Sj). (1) 

The receiver guesses the pair (W c , W{) from the channel output Y n . 

Definition 1: For positive integers n, M c and Mi, an (M c ,M\,n,e) code for the multiple access channel with 
states known noncausally at one encoder and only strictly causally at the other encoder consists of a mapping 

01 : W.xWjXS" — > X'l (2) 
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at Encoder 1, a sequence of mappings 

(f) 2 ,i : WcXS''- 1 X 2 , i = l,...,n (3) 

at Encoder 2, and a decoder map 

'• V — > W c xWi (4) 
such that the average probability of error is bounded by e, 

P£ = E s [Pr(^(Y") * (W c , W 1 )\S" = s")] < e. (5) 

The rate of the common message and the rate of the individual message are defined as 

1 1 
R c = -\ogM c and Ri = -logM!, (6) 

respectively. 

A rate pair (R c , R\) is said to be achievable if for every e > there exists an (2" Rc , 2 nRl , n, e) code for the channel 
Wy\x lr x 2 ,s- The capacity region of the considered state-dependent MAC is defined as the closure of the set of 
achievable rate pairs. 

III. Discrete Memoryless Case 
In this section, it is assumed that the alphabets S, Xi, X2 are finite. 

A. Capacity Region 

Let "P stand for the collection of all random variables (S, U, V, Xi, X2, Y) such that LL V, X\ and X2 take values in 
finite alphabets 11, V, Ti and X2, respectively and 

Ps,U,V,X 1 ,X 2 ,Y( s > u >V,X 1/ X 2 ,y) = Ps,U,V,XiX 2 (s,U,V,Xi J X 2 )WY[x 1 ,X2,s(y\ X ^ X 2' S ) ( 7a ) 

Ps,u,v,x 1 ,x 1 (s,u / v,x 1 ,x 2 ) = Qs{s)P Xl {x2)Pv\s,x 1 {v\s,x 1 )P Ui x l \s,v,x 1 {u,Xi\s,v,x 1 ) (7b) 
Ps,u,v,x l ,x 1 {s,u,v,Xi,x 1 ) = Qs{s)- (7c) 

U,V,X\ ,Xi 

The relations in 10 imply that (U, V) <-> (S, X\, X 2 ) «-> Y is a Markov chain, and X 2 is independent of S. 
Define C to be the set of all rate pairs (R c , R\) such that 

Ri < I(U;Y\V / X 2 )-I(U;S\V,X 2 ) 
R c + Ri < I(U,V,X 2 ;Y)-I(U,V,X 2 ;S) 

for some (S, U, V, Xy, X 2 , Y) e 7. (8) 

The following proposition states some properties of 6. 
Proposition 1: 
1. The set 6 is convex. 
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2. To exhaust C, it is enough to restrict V and U to satisfy 

|V| < |S||aCi||X 2 | + 1 (9a) 
|U| < (iSHXiUXal + ^ISIIXiUXz]. (9b) 

Proof: The proof of Proposition!]] appears in AppendixlAl 
As stated in the following theorem, the set 6 characterizes the capacity region of the state-dependent discrete 
memoryless MAC model that we study. 

Theorem 1: The capacity region of the multiple access channel with states known only strictly causally at the 
encoder that sends the common message and non-causally at the encoder that sends both messages is given by C. 

Proof: An outline proof of the coding scheme that we use for the direct part will follow. The associated error 
analysis and the proof of the converse appear in Appendix [B] 

Theorem Q] continues to hold if in 01 we replace Pu\s,v,x 2 by Pu\s,v- Also, it should be noted that setting V — in 
©, the capacity region C reduces to the union of all rate-pairs (R c ,Ri) satisfying 

Ri < I(U;Y\X 2 )-I(U;S\X 2 ) 

R c +R l < I{U,X 2 ;Y)-1(U,X 2 ;S) (10) 

for some measure on SxUxXixX 2 xy of the form 

Ps,u,Xi,x 2 ,y = QsPx 2 Pu,x 1 \s,x 2 Wy\x 1 ,X2,s- (11) 

Let C denote the region defined by 1101 and Jilt in the remaining of this paper. It has been shown in [5] that the 
region G' is the capacity region of the MAC model of FigureQ]but with the states completely unknown at Encoder 
2, i.e., while the encoding at Encoder 1 is given by l(3), the encoding at Encoder 2 is defined by the mapping 

tfr : W c — » X» (12) 

Observing that C'Cfi shows that the knowledge of the states only strictly causally at Encoder 2 in our model in 
general increases the capacity region. In Section llll-Bl we will show that the inclusion can be strict, i.e., C C Q. 

Furthermore, one can easily check that in the case of a channel that does not depend on the states, i.e., Wy|x lA x 2 ,s = 
Wy|Xj,x 2 / the capacity region C reduces to the closure of the union of all rate-pairs (R c , R\) satisfying 

Ri < I(X i; Y|Z,X 2 ) 

Rc+Ri < 1{X X ,X 2 ;Y) (13) 

for some 

Pz, Xi ,x 2 ,y = PzPjkv^^yfmA- ( 14 ) 



January 17, 2012 



DRAFT 



9 



Also, it is noted that Theorem [T] remains intact if we allow feedback to Encoder 1, i.e., before producing the z'th 
channel input symbol, Encoder 1 also observes the past channel output sequence Y 1 . That is, the encoding at 
Encoder 2 is still given by 10 and that at Encoder 1 is replaced by a sequence of mappings {<p\,i\? =1 , with 

cp u : ■W ( ..xW 1 xS"xy'- 1 — > Xx. (15) 

We now turn to the proof of achievability of Theorem[T] The following remark is useful for a better understanding 
of the coding scheme that we use to establish the achievability of Theorem[l] 

Remark 1: The proof of achievability of Theorem[T]is based on a block-Markov coding scheme in which a lossy 
version of the state is conveyed to the decoder, in the spirit of [15], [16], [32], in addition to a generalized Gel'fand- 
Pinsker binning for the transmission of the information messages [3]. However, unlike [15], [16] and [32] where 
Wyner-Ziv compression [31] is utilized for the transmission of the lossy version of the state, here, inspired by the 
noisy network coding scheme of [30], at each block the compression index of the state of the previous block is sent 
using standard rate distortion, not Wyner-Ziv binning. Also, unlike [15], [16] and [32] where every information 
message is divided into blocks and different submessages are sent over these blocks and then decoded one at a 
time using the same codebook as in the original compress-and-forward scheme by Cover and El Gamal [37], here 
the entire common message and the entire individual message are transmitted over all blocks using codebooks that 
are generated independently, one for each block, and the decoding is performed simultaneously using all blocks as 
in [30]. At the end of the transmission, the receiver uses the outputs of all blocks to perform simultaneous decoding 
of the information common and individual messages, without uniquely decoding the compression indices. □ 

Proof of Achievability: 

The transmission takes place in B blocks. The common message W c and the individual message W\ are sent over 
all blocks. We thus have B Wc = nBR c , B Wl = nBR lf N = nB, R Wc = B Wc /N = R c and R Wl = B Wl /N = R 1; where B Wc is 
the number of common message bits, Bw l is the number of individual message bits, N is the number of channel 
uses and Rw c and RyVi are the overall rates of the common and individual messages, respectively. 
Codebook Generation: Fix a measure Ps,u,v,Xt_,x 2 ,y £ 3\ Fix e > 0, r\ c > 0, rji > 0, fj > 0, 5 > 1 and denote 
Mc _ 2<>B[Rc-ice] f Ml _ 2 «B[Ri-t;ie] / m = 2" lk+r ' e] and / = 2" [I{U:SKX2)+6e] . 
We randomly and independently generate a codebook for each block. 

1) For each block i, i — 1, . . . , B, we generate M C M independent and identically distributed (i.i.d.) codewords 
X2 t i(w c , t'.) indexed by W c = 1, . . . , R c , t'. = 1, . . . , M, each with i.i.d. components drawn according to Px 2 ■ 

2) For each block i, for each codeword \z,i{w c , f'), we generate M i.i.d. codewords Vi(w c , t' jr fj) indexed by f, = 

1. . . . ,M, each with i.i.d. components drawn according to Py|x 2 - 

3) For each block i, for each codeword \2,i( w c> £/)/ f° r each codeword Vi(w c , fj, f,), we generate a collection of /Mi 

1.1. d. codewords {u,(zi? c , t J, t ,, W\, /,)} indexed by W\ — 1, . . . , Mi, y, = 1, . . . , /, each with i.i.d. components draw 
according to Pu| v;x 2 ■ 

Encoding: Suppose that a common message W c = w c and an individual message Wi = W\ are to be transmitted. 
As we mentioned previously, w c and w \ will be sent over all blocks. We denote by s[i] the state affecting the channel 
in block i, i = 1, . . .,B. For convenience, we let s[0] = and f_i = to = 1 (a default value). The encoding at the 
beginning of block i,i= 1, . . . , B, is as follows. 
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Encoder 2, which has learned the state sequence s[i — 1], knows f ,-2 and looks for a compression index € [1 : M] 
such that Vi-i{w ci f,_2/ ii-i) is strongly jointly typical with s[i — 1] and X2 l i-\{w ci U-z). If there is no such index or the 
observed state s[z — 1] is not typical, f;_i is set to 1 and an error is declared. If there is more than one such index f,_i, 
choose the smallest. Encoder 2 then transmits the vector X2,i(w c , zVi)- 

Encoder 1 obtains Xi,APci ff— i) similarly. It then finds the smallest compression index f, G [1 : M] such that 
Vi(w c ,ti-i,ti) is strongly jointly typical with s[z] and x^j(w c/ fi_i). Again, if there is no such index or the ob- 
served state s[i] is not typical, f; is set to 1 and an error is declared. Next, Encoder 1 looks for the smallest 
such that Ui(w c ,ti-i,ti,Wi, ji) is jointly typical with s[i] given (x 2j ;(h; c , f,_i), v,(w c , rVi/£;))- Denote this /'; by = 
j(s[i],w c ,ti-i,ti,w{). If such j* is not found, an error is declared and /(s[z], w c , t\-\, ty u>\) is set to ]\ = ]. Encoder 
1 then transmits a vector xi[z] which is drawn i.i.d. conditionally given Ui(w c , t{-\, t\, W\, /*), s[z], v,(zf c , f,-_i, f;) and 
X2,i( w ci h-l) (using the conditional measure Pxims,v,x 2 induced by 10). 

Decoding: At the end of the transmission, the decoder has collected all the blocks of channel outputs y[l], . . . , y[B] . 
Step (a): The decoder estimates message w c using all blocks i = 1, . . . , B, i.e., simultaneous decoding. It declares that 
w c is sent if there exist t B = (f 1# . . . , t B ) s [1 : M] B , w x e [1 : Mi] and ;' B = (/i, . . . , j B ) e [1 : J] B such that x 2/ i(^ c , h-i), 
Ui(w c , ti-x, tj, W\, ji), Vi(w c , ti-i, ti) and y[z] are jointly typical for all i = 1, . . . , B. One can show that the decoder obtains 
the correct w c as long as n and B are large and 

R c + Rj < I(U, V, X 2 ; Y) - I(U, y X 2 ; S). (16) 

Step (b): Next, the decoder estimates message W\ using again all blocks i = 1, . . . , B, i.e., simultaneous decoding. 
It declares that w x is sent if there exist t B = (h, . . . , t B ) € [1 : M] B , f = . . . , j B ) € [1 : /] B such that x 2/i (i& c , f,-i), 
u,(io c , f,-i, t„ i^i, /,*), Vi(tf>ci k-1, ti) and y[z] are jointly typical for all i = 1, . . . , B. One can show that the decoder obtains 
the correct W\ as long as n and B are large and 

Ri < I(U; Y\V, X 2 ) - I(U; S\V, X 2 ) (17a) 

Ri < I(U, y X 2 ; Y) - 7(/J, y X 2 ; S). (17b) 

□ 

In the coding scheme of Theorem[T] the state compression is standard, i.e., uses no Wyner-Ziv binning, the same 
message is sent in every block, and the decoding of the sent message is performed jointly using all blocks. Although 
of no benefit in the case of one relay, the combination of these three features was shown to be essential in achieving 
rates that are strictly larger than those offered by schemes based on Cover and El Gamal classic compress-and- 
f orward scheme [37] for certain networks with multiple relays in [30] . That is, the coding scheme of [30] outperforms 
Cover and El Gamal classic compress-and-forward for some multi-relay networks in [30]. One can wonder whether 
the same holds for our model, i.e., whether schemes based on Cover and El Gamal classic compress-and-forward, 
i.e., block Markov encoding combined with Wyner-Ziv binning, fall short of achieving optimality for our model. In 
this paper, we show that the capacity region C as given by l(8) can be achieved alternatively with a coding scheme 
that we obtain by building upon and modifying Cover and El Gamal original compress-and-forward scheme. The 
modification consists essentially in 1) decoding block-by-block backwardly instead of block-by-block forwardly 
and 2) non-unique decoding of the compression indices. (In fact, by investigating more closely the converse proof 
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of Theorem [TJ we will show later that 2) can be relaxed essentially without altering the capacity region). The 
following theorem states the result. 

Theorem 2: For the state-dependent multiaccess channel model that we study there exists an optimal coding 
scheme that uses Wyner-Ziv binning for the state compression. That is, the capacity region C given by <(Sj can also 
be achieved using a coding scheme in which the state compression is performed using Wyner-Ziv binning. 

Proof: The achievability proof of Theorem [2] is based on a block-Markovian coding scheme that combines 
carefully Gel'fand-Pinsker binning and Wyner-Ziv binning, and utilizes backward decoding with non-unique 
decoding of the compression indices. The complete proof of Theorem[2]is given in AppendixICl 

As we mentioned previously, the coding scheme of Theorem [2] shares elements with Cover and El Gamal 
original compress-and-forward [37, Theorem 7]; but differs from it mainly in two aspects. First, it uses backward 
decoding instead of the forward decoding of [37]; and, second, unlike [37] it does not require unique decoding 
of the compression indices. The second aspect is essential for getting the same rate expression as in {8]l, with no 
additional constraints. However, as we will see shortly in the corollary that will follow, one can modify the coding 
scheme of Theorem[2]in a way to get the compression indices decoded uniquely and still get the capacity region, at 
the expense of slightly larger |V| and larger |U|. The key element is the observation that the constraint introduced 
by getting the compression index decoded, i.e., (see AppendixlDt 



is also implicit in the converse proof of TheoremQ] That is, the auxiliary random variables U and V of the converse 
proof of Theorem[T]in Appendix|B]satisfy H9\ . 

Corollary 1: The coding scheme of Theorem[2]can be modified in a way to get the compression index decoded. 
The resulting coding scheme is optimal and achieves an equivalent characterization of the capacity region of the 
model that we study given by the set of all rate pairs (R c , R{) such that 



I(V; S|X 2 ) - I(V; Y\X 2 ) < I(X 2 ; Y), 



(18) 



or, equivalently 



I(V,X 2 ;Y)-I(V,X 2 ;S) > 0, 



(19) 



Ri < l{U;Y\V,X 2 )-l{U;S\V,X 2 ) 



Rc +Ri < I(U, V, X 2 ; Y) - I(U, V, X 2 ; S) 



(20) 



for some measure (S, U, V, X\, X 2 , Y)ef and satisfying 



I(V,X 2 ;Y)-I(V,X 2 ;S) > 0, 



(21) 



where the auxiliary random variables V and U have their alphabets bounded as 



|V|<|S||X 1 ||X 2 |+2 



(22a) 




(22b) 
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Proof: The coding scheme that we use for the proof of CorollaryQ]is very similar to that of Theorem|2j but with 
unique decoding of the compression indices. The details of the proof are given in Appendix |Pl 

We now establish an alternative outer bound on the capacity region of the DM MAC model that we study. This 
outer bound will turn out to be useful in the proof of the converse part of the coding theorem for the Gaussian case 
in Section[lV]since, as it will be shown, it is also achievable in that case. 

Theorem 3: The capacity region of the multiple access channel with states known non-causally at one encoder 
and strictly causally at the other encoder is contained in the closure of the set of all rate-pairs (R c , R\) satisfying 

Ri < 1{X X ;Y\S,X 2 ) 

R c + Ri < I(Xi, X 2 ; Y\S) - I(X 2 ; S\Y), (23) 
for some probability distribution of the form 

Ps.x^r = QsPx 2 Px 1 \x 2 ,sWy\x 1 jC2,s- (24) 
Proof: The proof of Theorem|3]appears in Appendix|E] 

Remark 2: In [5] the authors use an extension of the converse part of the proof of the standard Gel'fand-Pinsker 
capacity to establish a converse proof for the model with states S" known non-causally at Encoder 1 and no states 
at all at Encoder 2. Then, they show that their outer bound, which involves an auxiliary random variable, is itself 
contained in the region defined by J23l l. In Appendix lEl we provide a direct proof that the region defined by l l23l l is 
an outer bound on the capacity region of the more general model that we study here. Our converse proof accounts 
also for the availability of the states at Encoder 2 in a strictly causal manner. □ 

B. Example 

In Section llll-Al we have shown that the capacity region C of the model of FigureQ]is potentially larger than that, 

C, of the same model but with Encoder 2 being totally unaware of the states, i.e., C C 6. In this section, we show 
that this inclusion can be strict, i.e., C C Q. 

We use h(a) to denote the entropy of a Bernoulli (a) source, i.e., 

h(a) = -a log(a) - (1 - a) log(l - a) (25) 

and p * q to denote the binary convolution, i.e., 

p * q = p(l - cj) + q(l - p). (26) 

Consider the binary memoryless MAC shown in Figure [2] Here, all the random variables are binary jO, lj. The 
channel has two output components, i.e., Y" = (Y",Y^). The component is deterministic, Y'^ — X'^, and the 
component Y" — XV + S" + Z", where the addition is modulo 2. Encoder 2 knows the states only strictly causally 
and has no message to transmit. Encoder 1 knows the states non-causally and transmits an individual message 
Wi. The state and noise vectors are independent and memoryless, with the state process S,, > 1, and the noise 
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process Z\j, i > 1, assumed to be Bernoulli (|) and Bernoulli (p) processes, respectively. The vectors X" and X? are 
the channel inputs, subjected to the constraints 

n n 

< nqi and ^ X 2 ,, < n<j 2 - (27) 

(=1 !=1 



Encoder 2 

- J 




in 



Fig. 2. Binary state-dependent MAC example with two output components, V = (Y?, Y?), with Y" = X" + S" + Z'* 
and = X£. 

For this example, as we will show shortly, the strictly causal knowledge of the states at Encoder 2 does help, and 
in fact Encoder 1 can transmit at rates that are larger than the standard Gel'fand-Pinsker 1(11; Yi ) — 1(11; S) which 
would be the capacity had Encoder 2 been of no help. 

Claim 1: The capacity of the state-dependent binary memoryless MAC shown in Figure|2]is given by 



C B =max I(Xi;Yi|S). 

p(xi\s) 



(28) 



Proof: 1) The achievability follows from Theorem [TJ as follows. Set R c = and V = S, U = Xi, Y2 = X2 with X2 
independent of (S, X1 ) in TheoremQ] Evaluating the first inequality, we obtain 



Ri < I(U; Y\V, X 2 ) - I(U; S\V, X 2 ) 
= I(X 1 ;Y 1/ X 2 \S,X 2 ) 
= /(X 1 ;Y 1 ]S / X 2 ) 
= r(X 1/ X 2 ;Y 1 |S)-I(X 2 ;Y x |S) 
= J(X i; Yj]S) + J(X 2 ; YilXx, S) - J(X 2 ; Y^S) 
= 7(X 1 ;Y 1 |S)-J(X 2 ;Y 1 |S) 
= I(X 1 ;Y 1 |S), 



(29) 
(30) 
(31) 
(32) 
(33) 
(34) 
(35) 



where < |34b follows since X2 = Y2 and Y2 <-» (Xi, S) <-> Yi is a Markov chain, and the last equality follows by the 
Markov relation X2 <-* S «-> Yi for this example. 
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Evaluating the second inequality, we obtain 

Ri < I(U, V, X 2 ; Y) - I(U, V, X 2 ; S) (36) 

= I(Xi, S; Yi, X 2 ) + H(X 2 |Xi, S) - H(S) (37) 

= I(X lt S; Yi) + I(Xi, S; X 2 \Y 1 ) + H(X 2 |X!, S) - H(S) (38) 

= /(X!, S; Yi) + H(X 2 |Yi) - H(X 2 |X a , S, Yj) + H(X 2 |Xi, S) - H(S) (39) 

= I(Xi; Yi\S) + I(S; Y) + H(X 2 ]Y a ) - H(S) (40) 

= J(X i; Yj|S) + H(X 2 |Yj) - H(S|Y a ) (41) 

= I(Xi; Yj]S) + H(Yi|X 2 ) - H(Y a |S) + H(X 2 ) - H(S) (42) 

= J(X i; Yj|S) + H(S; YO + H(X 2 ) - H(S) (43) 

where <T40b follows since X 2 is independent of (Xi, S,Y{). 

Now, observe that with the choice X 2 ~ Bernoulli (^) independent of (S, Xi), we have fi(X 2 ) = H(S) = 1 and, so, 
the RHS of 03} is larger than the RHS of (35}. This shows the achievability of the rate Rj = 1{X\; Y^S). 

2) The converse follows straightforwardly by specializing Theorem 2 (or the cut-set upper bound) to this 
example, 

R</(X i; Y|X 2 ,S) (44) 
= r(Xi;Yi|X 2/ S) (45) 
= H(Y 1 |X 2 ,S)-H(Y 1 |X 1 ,X 2 ,S) (46) 
^HOTilSJ-HCyilXi/X^S) (47) 
<H(Yi]S)-H(Y!]Xi,S) (48) 
= I(X 1 ;Y 1 |S) 7 (49) 

where j47l l holds since conditioning reduces entropy, and 149t holds by the Markov relation X 2 <-> (Xi, S) <-> Y\. 
Claim 2: The capacity of the state-dependent binary memoryless MAC shown in Figure|2]satisfies 

C B = h(p * qi) - h(p) > max I(U;Yi) - I(U;S). (50) 

f?(l(,.T 1 |s) 

Proof: Claim 2 is a simple consequence of Claim 1 and known results on the capacity of the binary dirty paper 
channel (see for example [39] and references therein). More specifically, the capacity Cb in Claim 1 is that of a 
point-to-point state-dependent additive binary channel with a Bernoulli (j) state known at both transmitter and 
receiver ends, a Bernoulli (p) noise representing the binary symmetric channel and average input constraint qi at 
the transmitter. Thus, an explicit characterization of Cb is given by [39] 

C B =h(p*q 1 )-h(p). (51) 
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Let now Rqf be the maximum achievable rate had the strictly causal part S' 1 of the state been of no utility, or 
equivalently had Encoder 2 been of no help. Re? is the capacity of a binary dirty paper channel given by [39] 



R GF = max I(U; Yi) - I(U; S) 

p(u,xi\s) 



Gfai) if p* <qi<\ 



^log(^) if 0< qi <p* 



where p* — 1 - 2~' ! ' p ' and the function G(q), defined for q e [0, 1/2], is given by 



(52) 



J h(q)-h(p) if p <q <\ 
G(q) = \ (53) 
I if 0<q<p 

Observing that h(p * q\) > h(qi) for all < q\ < 1/2, it is easy to see that Cg > Rcp- 

Remark 3: In this example, the encoder that knows the states only strictly causally simply conveys these states 
to the receiver, noiselessly. The receiver then becomes aware of the channel states fully (since the delay in learning 
these states at the decoder has no impact on the capacity). This explains why Encoder 1 can transmit at rates 
that can be strictly larger than the standard GeTfand-Pinker rate (52} ; and in fact achieves the capacity J50t of a 
state-dependent additive binary channel with the states known at both transmitter and receiver ends. □ 

C. Common-message Capacity 

In this section, we study the important case in which the two encoders transmit only the common message, 
i.e., R\ = 0. The following corollary characterizes the capacity in this case, to which we refer as common-message 
capacity. 

Corollary 2: The common message capacity, C, of the multiple access channel with common message and states 
known non-causally at one encoder and strictly causally at the other encoder is given by 

C = max I(K,X 2 ;Y)-I(K,X 2 ;S) (54) 

where the maximization is over joint measures PsjcXi,X 2 ,Y of the form 

f > s,K,Xi,x 2 ,r = Qs^x 2 Pk,Xi|s,x 2 - (55) 
Proof: The proof of Corollary |2] appears in Appendix|F] 

Remark 4: The common-message capacity of our model in Corollary [2] coincides with the common-message of 
the model with the state sequence S" known noncausally at Encoder 1 and not at all at Encoder 2 [5]. That is, C 
can also be obtained by relaxing the constraint on R\ in the region C defined by JlOt and {11) . This shows that 
the knowledge of the states at Encoder 2 only strictly causally does not increase the common-message capacity. 
We should, however, note that this result is not a direct consequence of that in a MAC a state that is known only 
strictly causally at all encoders does not increase the capacity; and, so, the converse proof is needed here. □ 

IV. Memoryless Gaussian Case 

In this section, we consider a two-user state-dependent Gaussian MAC in which the channel states and the noise 
are additive and Gaussian. 
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A. Channel Model 

As in Section UTJ we assume that Encoder 1 knows the channel states non-causally and Encoder 2 knows the 
channel states strictly causally. The two encoders send some common message W c ; and, in addition, Encoder 1 
sends an individual message W\. At time instant i, the channel output Y, is related to channel inputs Xi,j and X2,, 
from the two encoders, the channel state S, and the noise Z, by 

Y; = Xy + X 2 ,, + Si + Z„ (56) 

where S, and Z, are zero-mean Gaussian random variables with variance Q and N, respectively. The random 
variables S; and Z, at time instant ;' e {1, • • • , n} are mutually independent, and independent from (Sj, Zj) for / + i. 
Also, at time i, the input X2,; is independent from the state S,. 

We consider the individual power constraints on the transmitted power 

n n 

YfaZnPx, Y J Xl<nP 2 . (57) 
The definition of a code for this channel is the same as given in Section [TTJ with the additional power constraints 



B. Capacity Region 

The following theorem characterizes the capacity region of the studied Gaussian model. 

Theorem 4: The capacity region of the Gaussian model l l56l is given by the set of all the rate pairs (R c ,Ri) 
satisfying 

* 1 <llog(l + Pl(1 -^-£ ) ) 
1 2 h \ N I 

R c + Ri < - log 1 + — — 

2 Pl( i_p2 2 -p2 s ) + ( VQ + pis VP7) 2 +N^ 



+ 2 1o s( 1 + ^r^)' (58) 

where the maximization is over p\i e [0, 1], pi s 6 [—1, 0] such that 



p\ 2 + pl<l. (59) 
Proof: An outline proof of Theorem|4]is given in AppendixlGl 

Remark 5: The capacity region of our model in Theorem [4] coincides with that of the model {56} but with the 
state sequence S" known noncausally at Encoder 1 and not all at Encoder 2 [5, Theorem 7]. Then, an implication of 
Theorem[4]is that it is optimal for our model to just ignore the states S !_1 that are known at Encoder 2 and use the 
coding scheme of [5]. That is, the availability of the states only strictly causally at the encoder that sends only the 
common message in our model does not increase the capacity region any further. While one could expect some 
utility of the collaborative transmission of a lossy version of the state to the decoder as in the memoryless discrete 
setup (and also in the Gaussian setups of [15], [16] and [19]), a direct consequence of our converse proof is that 
this would be of no help, in the sense that it would not result in better transmission rates. This can be interpreted 
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as follows. As it can be seen from the proof of Theorem [TJ the joint transmission of the state to the decoder aims 
at equipping it with an estimate of this state. This state estimate is then utilized as decoder side information for 
the decoding of the information messages. In the discrete memoryless case, this can be beneficial in general for the 
transmission of the private message, not the common message, as we already mentioned. In the Gaussian case, 
however, for the transmission of the private message, Encoder 1 knows the state non-causally and, therefore, it can 
cancel its effect completely using a variation of the standard dirty paper scheme [40], with no need to diminishing 
its effect via the joint transmission of the compressed version of the state. □ 
The following corollary follows straightforwardly from Theorem[4] 

Corollary 3: The common message capacity, Cq, of the Gaussian model l l56t is given by 



Ca = max - log 1 H — + - log 1 H — , 



(60) 



where the maximization is over p 12 € [0, 1], p\ s € [—1, 0] such that 

p\ 2 + p\ s <\. (61) 

V. Conclusions 

In this paper, we consider a state-dependent multiaccess channel with the channel state available noncausally 
at one of the encoders and only strictly causally at the other encoder. The decoder is not aware of the channel 
state. Both encoders transmit a common message and, in addition, Encoder 1 — the encoder that knows the state 
noncausally, transmits an individual message. We study the capacity region of this communication model. The 
analysis also helps understanding the utility of revealing the state only strictly causally to the encoder that sends 
only the common message as well as optimal compressions to perform it. 

In the discrete memoryless case, we characterize the capacity region of this model with a single-letter expression. 
In particular, the analysis reveals optimal ways of exploiting the knowledge of the state only strictly causally at the 
encoder that sends only the common message. The encoders collaborate to convey to the decoder a lossy version 
of the state, in addition to transmitting the information messages through a generalized Gelfand-Pinsker binning. 
Particularly important in this problem are the questions of 1) optimal ways of performing the state compression, 
and 2) whether or not the compression indices should be decoded uniquely. We develop two optimal coding 
schemes that perform the state compression differently. The first coding scheme is a-la noisy network coding, 
i.e., with no binning and non-unique decoding of the compression indices. The second coding scheme employs 
Wyner-Ziv binning with backward decoding and non-unique decoding of the compression indices. We note that 
backward decoding and non-unique decoding seem to be key elements for the optimality of the Wyner-Ziv based 
coding scheme. Also, we point out that the combination of these two features is likely to be beneficial in other 
scenarios in the context of networks with Wyner-Ziv compressions. Next, by exploiting our outer bound and the 
involved auxiliary variables specifically, we show that, although not required in general, for our specific model 
the compression indices can in fact be decoded uniquely essentially without altering the capacity region but at the 
expense of larger alphabets sizes for the auxiliary random variables. 
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The capacity region contains that of the model of [5], and this shows that revealing the state even only strictly 
causally to the encoder that sends only the common message is beneficial and enlarges the capacity region in 
general. Furthermore, by investigating a discrete memoryless example, we show that this inclusion can be strict, 
thus demonstrating the utility of conveying a compressed version of the state to the decoder cooperatively by the 
encoders. 

We also specialize our results to the case in which the two encoders send only the common message. We 
characterize the common-message capacity and show that knowing the states only strictly causally at one of the 
encoders is not beneficial in this case. 

Furthermore, we also study the memoryless Gaussian setting in which the channel state and the noise are 
additive and Gaussian. In this case, we establish an operative outer bound on the achievable rate pairs and then 
show that this outer bound is achievable; thus yielding a closed-form expression of the capacity region. Unlike the 
discrete memoryless case, we show that the knowledge of the states only strictly causally at the encoder that sends 
only the common message does not increase the capacity region in this case. 

Appendix 

Throughout this section we denote the set of strongly jointly e-typical sequences [41, Chapter 14.2] with respect 
to the distribution Px,y as 7"{Px,y)- 

A. Proof of Proposition [I] 

Part 1: To prove the convexity of the region, we use a standard argument. We introduce a time-sharing random 
variable T and define the joint distribution 

Pt,s,u,kxi,X2,y(£, s, u, v, xi, x 2 , y) = Pt,s,u,v,x,,x 2 (t, s, u, v, x x , x 2 )W Y \x 1 ,x 2 ,s{y\xi, *2, s) (A-l) 
Yj PT,s,u,v,x 1 ,x 2 (t,s,u,v,x 1 ,X2) = P T (t)Qs(s). (A-2) 

U,V,X\ ,X 2 

Let now (Rj, R 1 ^) be the common and individual rates resulting from time sharing. Then, 

Rl < I(U; Y\V, X 2 , T) - 1(U; S\V, X 2 , T) (A-3) 

= I(U;Y\V,X 2 ) - I(U;S\V,X 2 ) (A-4) 

R T C +R\ < I(U, V, X 2 ; Y\T) - I(U, V, X 2 ; S\T) (A-5) 

= I(U,V,X 2 ;Y\T) - I(U,V,X 2/ T;S) (A-6) 

< I(U,V,X 2 ,T;Y) - I(U,V,X 2 ,T;S) (A-7) 

= I(U,V,X 2 ;Y) - I(U,V,X 2 ;S), (A-8) 

where V := (V, T). That is, the time sharing random variable T is incorporated into the auxiliary random variable 
V. This shows that time sharing cannot yield rate pairs that are not included in C and, hence, C is convex. 
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Part 2: To prove that the region 6 is not altered if one restricts the random variables U and V to have their alphabets 
restricted as indicated in l|9), we invoke the support lemma [42, p. 310]. Fix a distribution fi e 7 of (S, U, V, X\, X 2 , Y) 
and, without loss of generality, let us denote the product set § X Xi X X2 = {1, . . ., tn], m = |SxXixX 2 |. 
To prove the bound d9ab on |V|, note that we have 

I^U;Y\V,X 2 )- I^U; S\V,X 2 ) 

= l^U, X 2 ; Y\V) - I^X 2 ; Y\V) - I lt (U, X 2 ; S\V) + I f ,(X 2 ; S\V) 

= H^U, X 2 , S| V) - H lt (U, X 2 , Y\V) - H,,(X 2 , S\V) + H,,(X 2 , Y\V) (A-9) 

and 

!_,((!, V,X 2 ;Y)-/ fl (fI,V,X 2 ;S) 

= I f ,((I, X 2 ; Y\V) - r„(H X 2 ; S|V) + ^(V; Y) - ^(V; S) 

= H,,(U, X 2 , S| V) - H f ,(!i, X 2 , Y| V) + H M (Y) - H,,(S). (A-10) 
Hence, it suffices to show that the following functionals of fi(S, U, V, X\, X 2 , Y) 

r ; (/.() = jx{s,x,x'), i = l,...,m-\ (A-lla) 
r m Qi) = f d,,(i')[H f ,(!J, X 2 , S|») - H^U, X 2 , Y|w) - H f ,(X 2 , S|») + H,,(X 2 , 1»] (A-llb) 

Jv 

r» + i(M) = £d f ,(»)[H f ,(U,X 2 ,S|i;) -H f ,(!i,X 2 , Y|u)] (A-llc) 

can be preserved with another measure fi' G 3 . Observing that there is a total of (|S||Xi||X 2 | + l) functionals in 
( lA-llt , this is ensured by a standard application of the support lemma; and this shows that the alphabet of the 
auxiliary random variable V can be restricted as indicated in l |9a"l l without altering the region C. 
Once the alphabet of V is fixed, we apply similar arguments to bound the alphabet of U, where this time (|S||Xi ||X 2 | + 
l)|S||Xi HX2I — 1 functionals must be satisfied in order to preserve the joint distribution of (S, V, X\, X 2 ), and one more 
functional to preserve 

!,,(!!; Y\V, X 2 ) - l^U; S\V, X 2 ) = H,,(Y, V, X 2 ) - H f ,(S, V, X 2 ) + H f ,(S, V, X 2 \U) - H,,(Y, V, X 2 \U) (A-12a) 
I^U, V, X 2 ; Y) - I^U, V, X 2 ; S) = H f ,(Y) - H,,(S) + H,,(S, V, X 2 \U) - H fl (Y, V, X 2 \U). (A-12b) 



This shows that the alphabet of the auxiliary random variable U can be restricted as indicated in (9b) without 
altering the region C; and completes the proof of Proposition[T] 

B. Proof of Theorem [I] 

1) Direct Part of Theorem [I] To bound the probability of error, we assume without loss of generality that 
the compression indices are all equal to unity, i.e., t\ = t 2 = . . . = tg = 1. 

We examine the probability of error associated with each of the encoding and decoding procedures. The events 
Ei, E 2 and E3 correspond to encoding errors, and the events £4, E5, and E7 correspond to decoding errors. 
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• Let Ei = Ur.Ei; where En is the event that, for the encoding in block i, there is no covering codeword 
•Vi-\{w c , t{-2, k-i) strongly jointly typical with s[i - 1] given X 2 ,i-\{w c , ti-j), i.e., 

B 

Ei = (J [$ ti_i € [1 : M] s.t.: (v 4 _i(w c , f;_2, t w ), s[i - 1], x 2/i _i(rt; c , f ; _ 2 )) £ ^(Py,s,x 2 )). (B-l) 
(=i 

For i e [1 : B], the probability that (s[; — l],X2,i-i(Wc/U-2)) is not jointly typical goes to zero as n — > oo, by 
the asymptotic equipartition property (AEP) [41, p. 384]. Then, for (s[i - l],X2,;-i(n> c/ ^-2)) jointly typical, 
the covering lemma [43, Lecture Note 3] ensures that the probability that there is no f,_i e [1 : M] such 
that (v;_i(zt> c/ f,_2, f;-i), s[z — 1]) is strongly jointly typical given X2,t-i(zf c/ f,_2) is exponentially small for large n 
provided that the number of covering codewords v;_i is greater than 2" 7 < y;S l X2 >, i.e., 

R > I(V; S\X 2 ). (B-2) 

Thus, if iB-2) holds, Pr(Ei,) — > as n — > 00 and, so, by the union of bound over the B blocks, Pr(Ej) — > 
as n — > 00. 

• Let E2 = U? =1 E2i where £2; is the event that, for the encoding in block i, Encoder 1 can find no covering 
codeword v,(zo c , f,_i, f,) strongly jointly typical with s[z] given X2 / ;(ro c/ f;-i). Similarly to the event Ei, it is easy 
to see that Pr^lE^) — > as n — > 00 if i lB-2t is true. 

• Let E3 = U^ =1 E3, where £3, is the event that, for the encoding in block i, there is no sequence vl-,(w c , fj_i, t,, 1, /',) 
jointly typical with s[;'] given X2 j i{w c , f,_i) and v,(ft> c , f,_i, f;), i.e., 

B 

E 3 = |J (3 ji € [1 : /] s.t: (u((w c , f,_i, f;, ipi, 7,), s[/], v;(» c , r,-i, tj), x 2 ,i(w c , f,-i)) e ^(P u , s ,v,x 2 )}. (B-3) 
(=1 

To bound the probability of the event £3,, we use a standard argument [3]. More specifically, conditioned on 
Ej . and E c 2 ., the complement events of Ei, and E 2 ,-, respectively, we have that the state s[;'] is jointly typical 
with (x 2l i(w c , U-i), Vi(w c , h-i, ti)). Then, for Ui(w c , f,_i, ti, w\, ji) generated independently of s[i] given x 2 ,i(w c , f,--i) 
and Vi(tf C( ti-x, ti), with i.i.d. components drawn according to Pu\V,X 2 i me probability that u,(if c , f,-_i, f„ W\, ji) is 
jointly typical with s[i] given X2,i{w c , U-i) and V;(rt> c , £,-_i, f,) is greater than (1 - e)2~ n< - I( - U ' S ^ v,x ^ +e ^ for sufficiently 
large 11. There is a total of / such u,'s in each bin. Conditioned on E c r and Ei., the probability of the event £3,, 
the probability that there is no such u,, is therefore bounded as 

Pr(E 3) |Ej lV E c 2i ) < [1 - (1 - £ )2-"m u ^ v ' x ^'>]l . (B-4) 

Taking the logarithm on both sides of 03) and substituting /, we obtain that ln(Fi(E 3i \E c li , E L 2j )) < -{l-e)2"<- 5 - 1)e . 
Thus, Pr(E 3! |E^ ; ,£2 ; ) — > as n — > 00 and, so, by the union bound, Pr(E3|Ej,E5;) — > as n — > 00. 

• For the decoding of the common message W c at the receiver, let £4 = U^E^ where £4, is the event that 
[X2,i(w c , ti-j), Ui{w c , ti-i, ti, W\, ;*), Vi(w c , ti-j, tj), y[/']) is not jointly typical, i.e., 

B 

E 4 = |J {(xjy(w c , f,-l), U»(B> C , f,-l, t„ Wy, /*), v t (w c , fj-i, U), y[i]) * ^(Px 2 , UKY ))- (B-5) 
1=1 

Conditioned on E^, E^ and E^., the vectors s[z], X2,;(w c , fj-i), v,(w c , £,-_i, i,) and u,(ro c , h-i,ti,Wi,j*) are jointly 
typical and with Xi[i]. Then, conditioned on E c y , E° 2j and E c 3j , the vectors s[i], X2 /I (w c , U-\), v,(a> c , f,), 
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VLi(w c ,ti-\,ti,W\, /*) and y[z] are jointly typical by the Markov lemma [41, p. 436], i.e., Pr(E 4l |E![ ; ., Eg.) — > 
as n —» oo. Thus, by the union bound over the B blocks, Pr(E4|Ej, Ej, Ej) —> as n — > oo. 
For the decoding of the common message w c at the receiver, let Eg be the event that \2j {w' c , t u,(o^, f,_i, f Wi, /',), 
v,(rt> ' c , ti-\, f,) and y[z] are jointly typical for all z = 1, ...,B and some w' c € [1 : M c ], W\ G [1 : Mi], 
f B = (r 1( . . . , t B ) G [1 : M] B and f = {ji,..., j B ) G [1 : /] B such that ^ * ro c , i.e., 



E 5 = | 3 w' c e [1 : M c ], W\ G [1 : Mi], t B = (fi, . . . , f B ) £ [1 : M] , f e [1 : /f s.t: w' c * w Cl 

B 

P| |(x 2 ,i«, k-j), u,« f,_i, f„ wx, jd, v,« ti-i, k), y[i]) g ^(Px 2 ,u,y,Y)}}. (B-6) 

To bound the probability of the event E5, define the following event for given w' c G [1 : M c ], l»i € [1 : Mi], 
{U-\, U) e [1 : M] 2 and G [1 : J] such that w' c * w c , 

E 5l {w' c , k, m, ;'«) = { (x2,,«., fi-i), u,«., zVi, fj, wi, /';), Vi(w' c , k), y[i]) g 7£(Px 2 ,u,v,y)}- 

Note that for zt£ + zu c the vectors X2,i(W c/ f,_i), v,(zt^, fj_i, f,) and Ui(w' c , fj_i, f„ zt>i, /',) are generated independently 
of y[z']. Hence, by the joint typicality lemma [43, Lecture Note 2], we get 

Pr(E 5; «, f,-i, f„ w x , j,)\E{, E c v E c 3 , E«) < Z-*™^. (B-7) 

Then, conditioned on the events Ej, Ejj, E3 and Ei, the probability of the event E5 can be bounded as 

B 

Px(E 5 \El,E c 2l E c 3 ,ED = Px( [J U U U n^^'^ 1 '^^ 1 '^ 1 ^'^'^'^) 

wJ*W e mi £ [l:Mi] (B e [1:M]B f e [1:J] B 1=1 
(«) B 

- E E EE rif)E5i(<A-iJi,™iji)\E{,E^,E$ 

Wc+w c an€[l:Mi] i B e [l:M] B ; e e [1:J] B '=1 



(/>) 



E E E E mE 5l (zv' c ,^ 1 ,t l ,w 1 ,j l )\E c 1 ,E 2 ,E 3 ,E' i ) 

w' c +w c W\^\y-M\\ fi g [1:M]B j' B £ [1:/] B 1=1 



-EE EE n pr ( £5 ' ( ^' f! - i ' t '' wi '^ )|E 5' £ 2' E 3' E 4) 

w' c *w c wie[l*fi] t B e [1:M] B f e [1:/P t=2 



E E E E f[2-+^H 

:^#itV- n>i£[l:Mi] ( B e [1 : M] B f e [1:/] B i=2 

^ ^ E E 2" (B_1) [^ + '' e ]2 _ " (B_1) [ i ' 



uf c +w c w 1 £[l:M 1 ] fi e [1:M] B j B e [1:/] B 1=2 
sJ#Wc wie[ltMi] t fl e [1:M] ;'b e [I:/] 

< M MlM j2"" (B " 1) [ i<LI ' y ' X2;Y) " r(U;S|l '' X2) " ft " ( '? +6+1)£ '] 



where: (a) follows by the union bound; (b) follows since the codebook is generated independently for each 
block z G [1 : B] and the channel is memoryless; and (c) follows by l|B-7| l. 
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The right hand side (RHS) of lB-8t tends to zero as n — > oo if 

Rc + Ri < ^(M V, X 2 ; Y) - I(U; S\V, X 2 ) - k) - | - M^M. (B . 9 ) 

Finally, using (|B-2ft to eliminate R from l lB-9t and taking B — » oo, we get Pr^lE^, E!j, E c y E^j) — » as long as 

R c + R\< I(U, V, X 2 ; Y) - I(U, V; S\X 2 ) 

= 1(U,V,X 1 ;Y)-1(U,V,X 1 ;S), (B-10) 
where the last equality follows since X 2 and S are independent. 

For the decoding of the individual message zv\ at the receiver, let E(, = \J B =1 E^i where is the event that 
X2,/(w C / ti-i), Vi(iv c , ti-i, tj), Ui(iv c , ti-i, tj, W\, j*) and y[z] are not jointly typical, i.e., 

E 6i = {(^2A w c r t,-i) r Vi(iu c A,-i,ti),M^rU-i,t,,u>i,i*),y[i\) $ 7"(P Xi ,v,u,y)}- (B-ll) 

From our analysis of the probability of the error event E4, it is easy to see that, conditioned on E c v E c 2 and E c y the 
event E 6l has exponentially small probability. Thus, by the union bound over the B blocks, Pr(E6|Ej, E c 2 , E^) — > 
as n — > 00, where E& = U^Eg,*. 

For the decoding of the individual message iv\ at the receiver, let E7 be the event that x 2>! - (w c , t Ui(w c , f f ztfj, /,*), 
Vi(w c , ti-i, tj) and y[i] are jointly typical for all i = 1, . . . , B and some w ' G [1 : Mi], t B = (tj, . . . , t B ) G [1 : M] B 
and f = (ji, . . . , j B ) G [1 : }] B such that w\ ± W\, i.e., 

E 7 = I 3 a;; £ [1 : Mi], f B = (h, . . ., f B ) e [1 : M] B , ; B G [1 : /] B s.t: w[ + w x , 

b 

Pi \(xi,Aw c , ti-j), Ui(w c , ti- lt t„ w[, jd, v,(w c , k-x, k), y[i\) e V(Px 2 ,u,v,y)} ■ (B-12) 

1=1 

To bound the probability of the event E7, define the following event for given w\ G [1 : Mi], (fj_i, f;) G [1 : M] 2 
and ji G [1 : /], 

E 7i (w[, ti-x, t„ j,) = { {x 2jl {zv c , ti-i), Ui(w c , ti-i, t„ w[, ji), v,(w c/ t,^, f,), y[i\) G T^(P X2 ,m/,y)}- 
Then, we have 

B 

Pr(E 7 |E c 1 , E c 2 , El, E% E c 5 , E c 6 ) = Pr( [j [j \J f] E 7 i{w' v f<_ 1# t„ ji)\E{, E° 2 , E c 3 , E%, E\, Eg) 

w\±w\ (B e [1:M]B /" £ [1:J] B 1=1 

^ E E E H^E 7l (zo[,t^, tl ,jd\El,El,El,El,5 c ,El) 

u/^wi ft' e [i : M]B f e [l:Jf ;=1 



£ £ £ ]^En(w[Ai-i,ti,ji)\El,El,E c 3 ,Ei,5 c ,E c 6 ) 

i»;Ml"E[l«l»j ! e[l:/I s '=1 



"EE E 1Pr(E 7i K,t I -i,i I ,/,)|E5,E^E^E^5 c ,E 6 ) (B-13) 

a/^an t e g [i:M]e f e [1:J]S i=2 

where: (d) follows by the union bound and (e) follows since the codebook is generated independently for each 
block i E [1 : B] and the channel is memoryless. 
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For + W\, the probability of the event E-ji{w' v ti-\, ti, /',) conditioned on E c v E c 2 , E c y E c 4 , Eg, E° 6 can be bounded 
as follows, depending on the values of f,_i and t,: 

i) if ti-j # 1 then \Ui{w c ,ti-i,ti,w'yft,X2j{Wc,U-\),Vi{w c ,ti-\,tM is generated independently of the output 
vector y[i] irrespective to the value of f,, and so, by the joint typicality lemma [43, Lecture Note 2] 

?i(E 7i (w' v k-u t h ji)\E c v E\, E%, E\, E%, E\) < 2 -nlW,v,XrX)-e\ _ (B . 14) 

ii) if t,'_i = 1 and f, + 1, then (Ui(ro c/ ff_i,f,v^ / /f),Vf(w C/ f;_i,ij)J is generated independently of the output 
vector y[i] conditionally on \2,i(w c , f,_i); and, hence 

Px(E 7i (w' v k-i, k, jtWv E v E v E % E 5> £ e) ^ 2-M u - v ^- £ \ (B-15) 

iii) if f,_i = 1 and f, = 1, then Ui(w c ,ti-i,ti,ivi, jj) is generated independently of the output vector y[z'] 
conditionally on X2,i(u> c , f,-_i) and Vi(w c , t\-\, f,); and, hence 

Pr(E 7i « k-i, U, ji)\E\, E\, E%, El, E%, E c 6 ) < 2 -'^ v - x ^\ (B-16) 

Now, note that since I(U, V; Y\X 2 ) > I{U; Y\V, X 2 ), if w' x ± W\ and f,-_i = 1 the following holds irrespective to 
the value of ti, 

Pr(E 7 ,K, k-i, k, ii)\E\,E c 2 , El, El, E c 5 , E c 6 ) < 2 -"lW;Y\v,x*)- e ] _ (B-17) 
Let h := 1{U; Y\V, X 2 ) and I 2 := I{U, V, X 2 ; Y). If the sequence (t lr t B -i) has k ones, we have 

B 

1 Pr(E 7 ,K, ti-y, t h ji)E\, E c 2 , E%, El, E%, E c 6 ) < 2 -»»hHB-i-k)i 2 -(B-m _ (B-18) 

{=2 
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Continuing from dB-14t , we then bound the probability of the event E7 as 
Fi(E 7 \E[,E L 2 ,E L 3 ,El,,E L 5 ,E c 6 ) 

B 

"EE E \^7i{w[Ai-i4i,ji)\E\,El,El,El,5 c ,E^ 

w\±-a h t B e [l : M] B fell:!? ! = 2 

B 

= E E E E n Pr ( E 7«K^:-i^,v/,)|E5,E^E^E^5 c ; ^) 

W^W, t B £ f 6 [1:/] B fS-1 E [l:Xl] B -l <=2 



^ ^ E 2^ /B - l\ 2 n(B-l-*)[s+fle] 2 -n[tt+(B-l- 
( B £ [1:M] f e [1:/] B 'c=0 * ' 

< 22 ^ ^ ( B _ 2 ,1<B 1 7c) [^ +r,<? ]2 "[ fcJl+(B x ~ fcK2 (B 1)<? ] 

ts £ [1:M] f £ [V.JP k=0 ^ ' 



= 22 ^2 ^ ^ 22 /B - 1\ 2 -n[sJi+(B-l-S:)a 2 -*)-(S-l-i)*}e-(B-l)e] 
f B £ [1:M] k e [1:/] f" 1 e [l:/]"" 1 fc=0 * ^ ' 
22 ^2 ^ ( B _ 2 "(B-l)[j(U;S|^X 2 )+ l 5e'] 2 -)i[fcI 1 +<B-l-fc)(_r 2 -R)-(B-l)( i 7+l)e-] 

io\ tB e [i : m] ;„ £ [1:/] /c=0 ^ ' 

22 ^2 ^ ^ S — 2 — ^[/c^J!— JC^S|\^^ 2 ))+(B— 1— ft— J(LJ;S|V^X2))— (S— X)C0->-S- ( -l)e-j 

rn^roi t B £ [1:M] /s £ [1:/] fc=0 * ' 

, 22 2^ — 2 — /i[(B— 1) i^ii^ — JC^S|^X 2 ), Z 2 — ft— JCL/jSl^^z))— (B— l>C/7^-S^-l>e-] 

tB £ [1:M] /b E [1:/] fc=0 ^ ' 

< Ml Xij2 B 2 _ '{ (B_1)mkl ( fl ^ <U;S|KX2)j2 ^ r(U;S|V ' X2 ^ <B_1)(f,+6+1)f ] 

= 2 -„b[ min (ft -WfiWM, fe-ft-/(U;S|V,X 2 ))-R, - 1 - ^fSl _ |-§- fcg§i ofy ] 

_ -nB[ y nun (ft -UUftVJb), h-k-I(U;S\V,X 2 ))-Ri - 1 - "T' - £ + (qi ¥ ) e ] 



(B-19) 



The right hand side (RHS) of 1 IB-8I 1 tends to zero as n — » 00 if 

Ri < ^( min (r, - I(U; S\V, X 2 ), h~R- W; S\V, X 2 )) - | - (B-20) 



Finally, using jB-2f l to eliminate R from JB-9t and taking B — > 00, we get Pr(E7|Ej, E£, E3, E£, Eg, Eg) — > as long 
as 

Ri <h-I(U;S\V,X 2 ) 

= I(U;Y\V,X 2 )-I(U;S\V,X 2 ) (B-21) 

and 

Ri < I 2 - I(V; S\X 2 ) - I(U; S\V r X 2 ) 

= I(U,V / X 2 ;Y)-I(U,V / X 2 ;S). (B-22) 
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Finally, noting that the condition | |B-22| | is redundant as R c > in IB-lOt , we obtain that the probability of error 
tends to zero as n — > oo and B — > oo if 

Ri < I(U; Y\V, X 2 ) - I(U; S\V, X 2 ) (B-23a) 

R c + Rr< I(U, V, X 2 ; Y) - I(U, V, X 2 ; S). (B-23b) 

This completes the proof of achievability. 

2) Converse Part of Theorem [1} We prove that for any (M c ,Mi,n,e) code consisting of a mapping (pi : 
W c xWiX§" — > X? at Encoder 1, a sequence of mappings <p2,i '■ "WcXS^ 1 — > X2, z = 1, ...,n, at Encoder 2, 
and a mapping i/> : y — > W c xWi at the decoder with average error probability P" — > as « — > and rates 
_R C = log 2 M c and Ri = rr l log 2 Mi, there exist random variables (V, U,X\,X 2 ) e VxUxXixX 2 with !J and V 
satisfying l[9} such that the joint distribution Ps,v,u,Xi,X 2 ^ s °f me form 

fs,KUXi^& = QsPx 2 Pv\s,x 2 Pu,Xi\v,s,x 2 , (B-24) 

the marginal distribution of S is Qs(s), i.e., 

^ Ps,y,Lr,Xi,x 2 (S/U/M/^i/^2) = Qs(s) (B-25) 

and the rate pair (R c ,Ki) satisfies©. 
Define the random variables 

y i = (w c ,s i - l ,Y? +1 ) 

Qi = (W 1 ,V i ). (B-26) 

Observe that the random variables so defined satisfy 

(Si, Qi, V h X u , X 2 ,i, Yi) e 0>, Vi € {I, . . . , n}. (B-27) 

We first prove the following auxiliary result. 
Lemma 1: The following inequalities hold: 

n 

i(w i; Y"\w c ) - i(w i; s"\w c ) < Wr, y,\v u x 2 ,,) - i(Ur, s,| v„ x 2/i ) (B-28) 

1=1 

n 

i(w c , w r , y") - i(w c , w r , s") < £ i(Q„ v„ x 2l) y,) - i{u u v r , s,\x 2i ) (B-29) 

Proof: i) We show the first inequality in the lemma as follows. 

i{Wr,Y n \w c ) - i(Wr, S"\w c ) (B-30) 

n 

= £ i(Wr, Yi\w c , y<; +1 ) - i(Wr, s,|w c , s 1 '- 1 ) (B-31) 

n 

= £ I(W lr S'- 1 ; Yi\W c , Y" M ) - IiS 1 - 1 ; Y,\W C , W lr Y" +1 ) - f(W x ; S,|W C , S 1 " 1 ) (B-32) 

!=1 

= £ /(Wi, s'- 1 ; y,|w c , y; ! +1 ) - i(Wr, s,l w c , s ! '- x ) - £ /(s^ 1 ; y,| w c , w lr y; ! +1 ) (B-33) 

i=l !=1 
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n 

(«) 



£ I(W lr ST 1 ; Yi\W c , Y' 1+1 ) - I(W a ; S,\W C , S'- 1 ) - £ I(S,; Y'' +1 \W C , W lf S'- 1 ) (B-34) 
1=1 1=1 

n 

= £ i(w lt s'- 1 ; YWc, y; ! +1 ) - i(s i; Wi, r; +1 \w c , s*- 1 ) (b-35) 

1=1 

n 

= £ i(w i; y,i w c/ s'- 1 , y; ! +1 ) + /(S*- 1 ; y,i w c , r; +1 ) - i(s i; r; +1 \w c , s*- 1 ) - i(s i; w t \w c , s-\ y? +1 ) (b-36) 

1=1 

n ii n 

= £ i(w i; y,i w c , s-\ y; ! +1 ) - i(s i; w t \w c , s-\ y? +1 ) + £ lis- 1 -, y,\w c , y? +1 ) - £ y; ! +1 iw c/ s- 1 ) (b-37) 

i=l 1=1 1=1 

( => £ i(w i; y |w c , s'- 1 , y? +1 ) - i(s, ; Wj| w c , s", y; ! +1 ) (b-38) 

1=1 

( => £ I(Wi; y,| W c/ S'- 1 , Y? +1 , X 2ti ) - Z(S,; Wi|W c , S'" 1 , Y" M , X v ) (B-39) 
i=i 

( => £ 7(0,; y|t/„ X 2/i ) - 1(0,; S,| V,, X 2<1 ) (B-40) 



22 /(s 1 - 1 ; y| w c/ Wi, yf +1 ) = J ( s - y !+il w " w i' s" 1 ) ( B " 41 ) 

1=1 i=l 

n n 

22 US?- 1 ; YiWc, Y? +1 ) = £ I(S ; ; Y? +1 |W C/ S 1 " 1 ) (B-42) 



1=1 1=1 

;i-l\ 



(c) follows from the fact that X 2| is a deterministic function of (W c , S ), and (d) follows by the definition of the 
random variables 0, and Vi in l lB-26t . 

ii) Similarly, we show the second inequality in the lemma as follows. 

I(W C , W i; y") - I(W C , W i; S") (B-43) 

n 

= 22 I(W C , Wi; Y;|Y? +1 ) - I(W C , Wi; S^ 1 ) (B-44) 
i=i 

- £ i(w c , Wi, s 1 - 1 ; y,|y; ! +1 ) - /(s*- 1 ; y |w c , Wi, y; ! +1 ) - i(w c , w i; s^s*- 1 ) (b-45) 

i=i 

n n 

= £ i(w c , Wi, s*- 1 ; y,|y; ! +1 ) - i(w c , w i; s.-is 1 '- 1 ) - £ its'- 1 ; y-iw c , w t , y; ! +1 ) (b-46) 

i=i i=i 

n n 

( => ^ I(W C/ Wi, S'- 1 ; Yi|Yf +1 ) - I(W C , Wi; S^S*'" 1 ) - £ IO£i; S<|W C , Wi, S" 1 ) (B-47) 

i=l ;=1 
n 

- £ i(w c , Wi, s 1 - 1 ; y|y; ! +1 ) - i(w c , Wi, y? +1 ; s^s*- 1 ) (b-48) 

i=i 

- £ i(w c , Wi, s*- 1 ; y|y; ! +1 ) - h^is- 1 ) + h(s,\w ci w lr s?~\ y'; +1 ) (b-49) 
i=i 

n 

f) 22 Wc, Wi, S'- 1 ; y ; ]Y? +1 ) - H(Sd + H(Si\W c , W lt S?~\ Yf +1 ) (B-50) 
!=i 
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= £ i(w c , Wi, s'- 1 -, y,|y; ! +1 ) - i(w c , w it sr 1 , y" m - s t ) (B-51) 

1=1 

n 

< i(w c , w l7 s'-\ y; +1 ; y t ) - i(w c , w it s'-\ y; ! +1 ; so (B-52) 
(=1 
» 

i 1 £ /(We, Wi, S'-\ r; +1 ,X 2l ; Y { ) - I(W C , Wi, S'- 1 , Y* v X 2i ; S,) (B-53) 

( ^ £ J(Q„ V„ X 2l ; Y,) - I(Ui, V it X 2i ; S,) (B-54) 
i=i 

where (e) follows from Csiszar and Korner's Sum Identity JB-4H ; (/) follows from the fact that the state S" is i.i.d.; 
(g) follows from the fact that X 2l is a deterministic function of (W c , S !_1 ), and (h) follows by the definition of the 
random variables (i, and V; in jB-261 . ■ 
We continue the proof of the converse. The decoder map xp recovers (W c , W\) from Y" with vanishing average 
error probability P". By Fano's inequality we have 



H(W c ,Wi|Y") < ne„, (B-55) 



where e„ — » as P" — > 0. 

We can bound the individual rate as 



nRi <H(Wt\W c ) (B-56) 

= J(Wi; Y"\W C ) + H(Wi|y", W c ) (B-57) 

< I(Wi;Y"\W c ) + ne n (B-58) 

= I(W X ; Y*]W C ) - I(Wi; S"|W C ) + ne„ (B-59) 



( => £ r(Q i; Y, \V„ X 2j ) - I(LT i; S,|Vi, X 2 ,i) + ne„ (B-60) 



M 

'i=i 

where (0 follows by using and the fact that H(Wi|W c , Y") < H(W C , Wi|Y*); (/') follows from the fact that the 

messages are independent of each other and of the state sequence; and (k) follows by LemmaQ] 
Similarly we can bound the sum rate as 

n(R c + Ri) < H(W C , W a ) (B-61) 

= I(W C , Wx; Y") + H(W C , Wt\Y") (B-62) 

(0 

<I(W c ,Wi;Y n ) + ne„ (B-63) 

( = I(W C , W a ; Y") - I(W C , W a ; S") + we„ (B-64) 



£ /(Q,v 17,, X 2; ; Y,) - 1(0,, y„ X 2! ; S,), (B-65) 



'i=l 

where (/) follows by dB-551 ; (m) follows from the fact that the messages are independent of the state sequence; and 
(n) follows by Lemma [I] 
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From the above, we get that 

Ri < - V I(U,; Yi\Vi, X 2 ,i) - I(Q«; SiWt, X 2/i ) + e„ 
n *—* 

1-1 

1 n 

R c + Ri<-Y Kfli, Vi, X 2 , ; ; Yt) - l{U h %, X 2 ,,; S,) + e n . (B-66) 
n i 

!=1 

The statement of the converse follows now by applying to dB-66t the standard time-sharing argument and taking 
the limits of large n. This is shown briefly here. We introduce a random variable T which is independent of S, 
and uniformly distributed over jl, • • • ,n). Set S = St, U = Uj, V — Vj, X\ — Xij, X 2 — X 2 j, and Y — Yj. Then, 
considering the first bound in dB-66l l, we obtain 

1 " 

- Y I(U t ; YiWu X 2 ,d - Ifa; Si\Vi, X 2)i ) 
n ' 

= I{U; Y\V, X 2/ T) - I(U; S\V, X 2 , T) 

= I(U, T; Y\V, X 2 , T) - I(U, T; S\V, X 2 , T). (B-67) 

Similarly, considering the second bound in dB-66b , we obtain 

1 " 

- T I(Ui, % X 2ii ; Yd - I(Ui, V { , X 2ti ; S ; ) 
n *—* 

!=1 

= I(U, V, X 2 ; Y\T) - I(U, V, X 2 ; S\T) 

= I(T, U, V, X 2 ; Y) - I(T; Y) - I(T, U, V, X 2 ; S) + I(T; S) 

< I(T, U, V, X 2 ; Y) - I(T, U, V, X 2 ; S). (B-68) 
The distribution on (T, S, U, V, X\, X 2 , Y) from the given code is of the form 

fT,s,Q,y,Xi,X2,Y = QsPtPx 2 \tPv\X2,s,tPu,x 1 \v,s,x 2 ,t w y\x 1 ,x 2 ,s- (B-69) 

Let us now define U = (Q, T) and V = (V, T). Using iB^66t , <TB^67b and iB^68t , we then get 

Ri < I(U; Y\V, X 2 ) - I(U; S\V, X 2 ) + e n 

R c + Ri< I(U, V, X 2 ; Y) - I(U, V, X 2 ; S) + e,„ (B-70) 

where the distribution on (S, !i, V, Xi , X 2 , Y), obtained by marginalizing | |B-69| | over the time sharing random variable 
T, satisfies (S, U, V, X lt X 2 , Y) e 7. 

So far we have shown that, for a given sequence of (e n ,n,R c ,R\)— codes with e„ going to zero as n goes to 
infinity, there exist random variables (S, U, V, X\, X 2 , Y) € CP such that the rate pair (R c , R\) essentially satisfies the 
inequalities in l[8}, i.e., (R c ,Ri) € 6. 

This completes the proof of the converse part and of Theorem[T] 
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C. Proof of Theorem^ 

The transmission takes place in B blocks. The common message W c is divided into B blocks w c i, . . . ,w cB 
of nR c bits each, and the individual messages Wi is divided into B blocks W\ : \, . . . ,W\fi of nR\ bits each. For 
convenience, we let w c g = w^b = 1 (a default value). We thus have Bw c = n(B - 1)R C , Bw 1 = n{B - l)Ri, N = nB, 
Rw c = Bw c /N = R C -(B — 1)/B and R^ = Bw 1 /N = Ry(B — 1)/B, where Byv r is the number of common message bits, 
Bwi is the number of individual message bits, N is the number of channel uses and Ryv c and R^ are the overall 
rates of the common and individual messages, respectively. For fixed n, the average rate pair (Rw^RwJ over B 
blocks can be made as close to (R c , R{) as desired by making B large. 

Codebook Generation: Fix a measure Psmv,x u x 2 ,y g CP. Fix e > and denote M c = 2"^"^, Mi = 2" [Rl ^'' iel , 
Mo = 2" [Ro+ ' Ioel ,M = 2"^ +, ' el , / = 2"W U;S|vr ' X2 ) +Suf l . 

1) We generate M c Mo independent and identically distributed (i.i.d.) codewords X2{w c ,s) indexed by w c = 
1, . . . , R c , s = 1, . . . ,Mo, each with i.i.d. components drawn according to Px 2 - 

2) For each codeword X2(w c ,s), we generate M independent and identically distributed (i.i.d.) codewords 
v(w c , s, z) indexed by z = 1, . . . , M, each with i.i.d. components drawn according to Py\x 2 ■ 

3) For each codeword X2{w c ,s), for each codeword v(w c ,s, z), we generate a collection of JM\ i.i.d. codewords 
{u(zf c , s, z, if i, /)} indexed by W\ = 1, . . . , Mi, j = \, ...,], each with i.i.d. components draw according to Pu|y,x 2 ■ 

4) Randomly partition the set jl, . . . ,M) into M cells C s , s G [1,M ]. 

Encoding: Suppose that a common message W c = w c and an individual message Wi = rt'i are to be transmitted. 
As we mentioned previously, message W c is divided into B blocks w Ci \, . . .,w c ,b and message u>\ is divided into B 
blocks W\ : i, . . . , w^g, with (w CjI -, Wij) the pair messages sent in block We denote by s[;'] the channel state in block i, 
i = 1, . . . , B. For convenience, we let s[0] = (p and Zq = 1 (a default value), and So the index of the cell containing Zo, 
i.e., Zo G C So . The encoding at the beginning of the block i, i = 1, . . . , B, is as follows. 

Encoder 2, which has learned the state sequence s[i — 1], knows s,_2 and looks for a compression index z,-_i G [1, M] 
such that v(w C I _i, s,_2> z,_i) is strongly jointly typical with s[i — 1] and X2(itf c ,;-i/ S/-2)- If there is no such index or the 
observed state s[i — 1] is not typical, Z;_i is set to 1 and an error is declared. If there is more than one such index 
z,_i, choose the smallest. One can show that the probability of error of this event is arbitrarily small provided that 
n is large and 

R> I(V;S\X 2 ). (C-l) 

Encoder 2 then transmits the vector Xi{w c ,i, s,_i), where S;_i is such that z,_i G C Si _j . 
Encoder 1 obtains Xz(w c ,u Sj-i) similarly. It then finds the smallest compression index z, G [1,M] such that 
v(w Cr i,Si-i,zi) is strongly jointly typical with s[i] and X2(zv c j, s,_i). Again, if there is no such index or the observed 
state s[i] is not typical, z, is set to 1 and an error is declared. Let Si G [l,Mo] such that z, G C s> .. Next, Encoder 1 looks 
for the smallest /, such that u(w C/ i,s ! _i,Z/, a;y, y',) is jointly typical with s[z], X2{w c j, S;_i) and v(w C j, S/_i,z,). Denote 
this ji by /* = j(s[i], iv Cfi , s,_i, z;, W\j). If such is not found, an error is declared and j(s[i], w c j, Sj-i, z ir ivij) is set to 
ji = J. Encoder 1 then transmits a vector xi[z'] which is drawn i.i.d. conditionally given s[i], u(w Cr i, s,_i,Z;, icy, /'*), 
v(w Cr i,Si-i,zi) and X2(w Cj! -,s,_i) (using the conditional measure Pxiis,u,vpc 2 induced by P^ukx^Xj^ g CP). 
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Decoding: Let y[z] denote the information received at the receiver at block i, i = 1, . . . , B. The receiver collects 
these information until the last block of transmission is completed. The decoder then performs Willem's backward 
decoding [45], by first decoding the pair {w C) b-\, Wi,b-i) from y[B - 1]. 

1) Decoding in Block B - 1: 

The decoding of the pair {w Ci b-\, Wi,b-i) is performed in four steps, as follows. 
Step (a): The decoder knows w Ci b = 1 and looks for the unique cell index §b-i such that the vector X2(w C/ b,§b-i) 
is jointly typical with y[B]. The decoding operation in this step incurs small probability of error as long as n is 
sufficiently large and 

R Q <l{X 2 ;Y). (C-2) 

Step (b): The decoder now knows §g-\ (i.e., the index of the cell in which the compression index Zg-i lies). 
It then decodes message zu C; b-i by looking for the unique w C; b-i such that X2(w c ,b-i,Sb-2), v(zi\.,b-i,Sb-2,Zb-i), 
u(w C/ b_i,Sb-2,Zb-i,iui,b-i,/b-i) and y[B - 1] are jointly typical for some s B - 2 e [1,M ], Wi,b-i e [l,Mi], /b-i 6 [1J] 
and Zb-i € Cj B _, . One can show that the decoder obtains the correct W c ,B-l as long as n and B are large and 

R + (R - R ) + R C + R 1 < I(U, V, X 2 ; Y) - I(U; S\V r X 2 ). (C-3) 

Step (c): The decoder knows w Ci b-i and can again obtain the correct Sb-2 if n is large and W2-2\ is true. This is 
accomplished by looking for the unique §b-i such that the vector X2(iI> C/ b-i, §b-z) is jointly typical with y[B - 1]. 
Step (d): Finally, the decoder, which now knows message w Ci b-i and the cell index Sb-2 (but not the exact compression 
index Zg_i), estimates ic^b-i using y[B - 1]. It declares that zi>i,B-i was sent if there exists a unique $i,b-i such that 
X2(w C/ B-i, h-2), v(tf> c ,B-i, §b-2, Zg-i)' u ( 1 Kb-i, h-2, z' B _ v ®\fl-i, /B-i) and y[B - 1] are jointly typical for some z' B1 £ C^, 
and ;' B _! £ [1J]. 

• If Zj j = Zb-i, the decoder finds the correct Wijb-i for sufficiently large n if 

Ri < I(U; Y\V, X 2 ) - I(U; S\V, X 2 ). (C-4) 

• If zi, # Zb-i, the decoder finds the correct Wij,-\ for sufficiently large n if 

(R - _R ) + Ri < I(U V; Y\X 2 ) - I(U; S\V, X 2 ). (C-5) 

2) Decoding in Block b, b = B - 1, B - 2, . . ., 2: 

Next, for b ranging from B — 1 to 2, the decoding of the pair (ty c ,6-i, W 1,6-1) is performed similarly, in five steps, by 
using the information y[b] received in block b and the information y[b — 1] received in block b — 1. More specifically, 
this is done as follows. 

Step (a): The decoder knows w c ^, and looks for the unique cell index %-\ such that the vector X2(if C; b, §b-i) is jointly 
typical with y[b]. The decoding error in this step is small for sufficiently large n if l lC-2t is true. 
Step (b): The decoder knows and decodes message ro c ,;,-i from y[b]. It looks for the unique tb c ji-i such that 
X2(#e,*-i/S&-2)/ v (**cb-i> s 6-2, Zfc-i), Zfc-i/ jb-i) and y[b- 1] are jointly typical for some s 6 _ 2 e [1,M ], 

ifi^-i £ [1, Mi], jy_i e [l,/]andzj_! e C ?l] . One can show that the decoding error in this step is small for sufficiently 
large n if is true. 
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Step (c): The decoder knows and obtains Sj,_ 2 by looking for the unique Sj,_ 2 such that the vector x 2 (tt> C/ ;,_i, S{,_ 2 ) 
is jointly typical with y[b — 1]. For sufficiently large ft, the decoder obtains the correct Sj,_ 2 with high probability if 
dC-21 > is true. 

Step (d): Finally the decoder, which now knows message w Ci b-i and the cell index s^_ 2 (but not the exact compression 
index z&_i), estimates message W\ t b-\ using y[b — 1]. It declares that iVi r b-i was sent if there exists a unique w^b-i 
such that x 2 (w Cr b-i,Si,-2), ^{^c.b-irSb-ir^^, u(tb Ct b-i, §b-2> z Li / ^1,6-1/ jb-l) an d y[b - 1] are jointly typical for some 
z' h _ x E C Vl and jb-i £ [1,/]. 

• If zf, = Zj_i, the decoder finds the correct W\^-\ for sufficiently large ft if dC-4t is true. 

• If zi, + Zb-i, the decoder finds the correct W\%-\ for sufficiently large ft if dC-5t is true. 

Fourier-Motzkin Elimination: From the above, we get that the error probability is small provided that n is large 
and 

R < I(X 2 ; Y) (C-6a) 

R > 1(V; S\X 2 ) (C-6b) 

Ri < I(LT; Y|V, X 2 ) - I(U; S\V, X 2 ) (C-6c) 

(R - Ro) + Ri< I(U, V; Y\X 2 ) - I(U; S\V, X 2 ) (C-6d) 

R c + Ri + R < y, X 2 ; Y) - I(U; S] V, X 2 ). (C-6e) 

We now apply Fourier-Motzkin Elimination (FME) to project out Ro and R from dC-61 >. Projecting out Ro from iC-6\ , 
we get 

R > I(V; S|X 2 ) (C-7a) 

Ri < I(U; Y\V, X 2 ) - I(U; S\V, X 2 ) (C-7b) 

R + Ri < y, X 2 ; Y) - S]V; X 2 ) (C-7c) 

R c + Ri + R < I(U, V, X 2 ; Y) - r(LZ; S| V, X 2 ). (C-7d) 

Note that the inequality l |C-7cl l can be implied by l |C-7dt since R c > 0; and, so, is redundant in l |C-7| l. Finally, 
projecting out R from the remaining system, we obtain 

Ri < I(U; Y\V, X 2 ) - I(U; S\V, X 2 ) (C-8) 

Rc + Ri < I{U, V, X 2 ; Y) - I(LT, V, X 2 ; S). (C-9) 

This completes the proof of Theorem [2] 

D. Proof of Corollary m 

1 ) Converse Part: Investigating the proof of Theorem[T]in AppendixlBl it can be seen that the auxiliary random 
variables U and V satisfy tacitly the condition 

I(V, X 2 ; Y) - I(V, X 2 ; S) > 0. (D-l) 
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This can be seen by noticing that (with the notation of Appendix|B)l 

n 

J(Wi; Y"\Wc) = £ /(Qi; X 2/i ) - Z(Q,; S,| V„ X 2 , ; ) (D-2) 
i=i 
;? 

I(W C , Wi; Y") < % X 2i ; Y,) - I(Q„ t/„ X 2i ; S t ). (D-3) 

i=i 

and then observing that I(Wi; Y"|W C ) < I(W C , Wi; Y"), which together yield 

£ I(V«, X 2 ,; Y,) - I( V,-, X 2i ; S.) > 0; (D-4) 

i=i 

and, so, after standard single-letterization, the condition JD-ll l. 

2) Direct Part: The codebook generation and the encoding process remain exactly as in the proof of Theorem|2] 
in Appendix|C] The decoding at the receiver is modified in a way to get the compression indices decoded uniquely, 
as follows (with the notation of Appendix Q. 

Decoding: Let y[z] denote the information received at the receiver at block i, i = 1, . . ., B. The receiver collects 
these information until the last block of transmission is completed. The decoder then performs Willem's backward 
decoding [45], by first decoding the pair (w c ,b-1/ H>i,b-i) from y[B - 1]. 

1) Decoding in Block B — l: 

The decoding of the pair (iy C/ B-i, Wi,b-i) is performed in five steps, as follows. 
Step (a): The decoder knows w Ci b = 1 and looks for the unique cell index §b-i such that the vector x 2 (zu Ci b, Sb-i) is 
jointly typical with y[B]. This decoding operation incurs small probability of error as long as n is sufficiently large 
and 

#o<J(X 2 ;Y). (D-5) 

Step (b): The decoder now knows Sb-i (i.e., the index of the cell in which the compression index Zb-i lies). 
It then decodes message zt> C; B-i by looking for the unique w c ,b-i such that x 2 (w C/ b-i,Sb- 2 ), v(zu C; b-i,Sb- 2 ,Zb-i), 
u(# c ,b-i / Sb-2,Zb-i,h^,b-i / ./b-i) and y[B - 1] are jointly typical for some s B - 2 e [1,M ], zt>i, B -i e [l,Mi], /b-i e [1,J] 
and Zb-i € Cj B _, . One can show that the decoder obtains the correct Z0 c< b-i as long as n and B are large and 

R + (R - R ) + R C + R 1 < I(U, V, X 2 ; Y) - I(U; S\V, X 2 ). (D-6) 

Step (c): The decoder knows w c ,b-i and can again obtain the correct Sb- 2 if n is large and dD-5t is true. This is 
accomplished by looking for the unique Sb- 2 such that the vector x 2 (i() C/ b-i, Sb-z) is jointly typical with y[B - 1]. 
Step (d): The decoder calculates a set £(y[B - 1]) of Zb-i such that Zb-i e £(y[B - 1]) if v(w c ,b_i,sb- 2 ,2b-i), 
x 2 (w c ,b-1/ Sb-i), y[B - 1] are jointly typical. It then declares that Zb-i wa s sent in block B — 1 if 

Zb_! e C sVl n £(y[B - 1]). (D-7) 

One can show that Zg-i — Zb-i with arbitrarily high probability provided that n is sufficiently large and 

R< I(V;Y\X 2 ) + R - (D-8) 
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Step (e): Finally, the decoder, which now knows message iv c ,b-i, the cell index s B _ 2 arid the compression index 
Zb-i € Cs B _j, estimates W\p-\ using y[B - 1]. It declares that W\,b-i was sent if there exists a unique h>i /B _i such that 
X2(w C/B -i, s B _ 2 ), v(ri> C; B-i, h-2, 2b-i), u(tf> C;B -i, s b _ 2 , z b _i, t&i^-i, /b-i) and y[B-l] are jointly typical for some ;' B _i e [1, /]. 
One can show that the decoder obtains the correct W\^-\ as long as n is large and 

Ri < I(U; Y\V, X 2 ) - I(IZ; S|VJ X 2 ). (D-9) 

2) Decoding in Block b, b = B - 1, B - 2, . . ., 2: 

Next, for b ranging from B — 1 to 2, the decoding of the pair {w c j,-\, ttfi^-i) is performed similarly, in five steps, by 
using the information y[b] received in block b and the information y[b — 1] received in block b — 1. More specifically, 
this is done as follows. 

Step (a): The decoder knows w c j, and looks for the unique cell index Sf,_i such that the vector x 2 (iu c j,, is jointly 
typical with y[b]. The decoding error in this step is small for sufficiently large n if JP-5b is true. 
Step (W: The decoder knows and decodes message w c ,ii-i from y[£>]. It looks for the unique z& Cj j,_i such that 
x 2 (^-i,S6- 2 ), v(t& C/6 _ 1( s b _ 2 , zj-i), u(r& C;b _i, s 6 _ 2 , zn, m.b-i, jb-i) and y[b- 1] are jointly typical for some s 6 _ 2 e [1,M ], 
ifj^i e [l,Mi], jy_i e [l,/]andzj_! e Cs t] . One can show that the decoding error in this step is small for sufficiently 
large n if dP-6t is true. 

Step (c): The decoder knows t£> C/ &-i and obtains Sj,_ 2 by looking for the unique Sf,_? such that the vector x 2 (tt> Ci f,-i, §&_ 2 ) 
is jointly typical with y[fc — 1]. For sufficiently large ft, the decoder obtains the correct Sj,_ 2 with high probability if 
dD-5t is true. 

Step (d): The decoder calculatesaset£(y[fr-l]) of zh such that e £(y[fo-l])ifv(z& Ci ;,_i,s;,_ 2 ,2f,_ 1 ),x 2 (zy Ci i,_i,Si,_ 2 ), 
y[f> — 1] are jointly typical. It then declares that z;,_i was sent in block b — lit 

ii e e Vl n £(y[fc - 1]). (D-10) 

One can show that, for large n, z^j = Z;,_i with arbitrarily high probability provided that dD-8l l is true. 
Step (e): Finally, the decoder knows message w c ,b-i, the cell index S;,_ 2 and the compression index z&_i G C$,_i' and 
estimates W\ : b-\ using y[b — 1]. It declares that z&i^-i was sent if there exists a unique W\ : b-i such that x 2 ($ C/ {,_i, S;,_ 2 ), 
v(i& c s^ 2 , Z{,_i), u(ro C/ [,_i, S[,_ 2 , itfy,-!, /j-i) and y[£> - 1] are jointly typical for some € [1, /]. One can show 
that the decoding error in this step is small for sufficiently large n if dD-9| l is true. 

Fourier-Motzkin Elimination: From the above, we get that the error probability is small provided that n is large 
and 

R < I{X 2 ; Y) (D-lla) 
R < I(V; Y|X 2 ) + R (D-llb) 
R > I(V; S\X 2 ) (D-llc) 

Ri < 1(11; Y\V, X 2 ) - 1(11; S\V, X 2 ) (D-lld) 
R c + Ri +R< I(U, V, X 2 ; Y) - I(U; S\V, X 2 ). (D-lle) 
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Applying Fourier-Motzkin Elimination (FME) to project out R and Rq from ID-lit , we get 

< I(V, X 2 ; Y) - I(V, X 2 ; S) (D-12a) 
Ri < I(U; Y\V, X 2 ) - I(U; S\V, X 2 ) (D-12b) 
R c + Ri< I(U, V, X 2 ; Y) - I(U, V, X 2 ; S). (D-12c) 

3) Bounds on \V\ and \U\: It remains to show that the rate pair l |2"0l is not altered if one restricts the random 
variables V and U to have their alphabet sizes limited as indicated in (22) . This is done by a standard application 
of the support lemma [42, p. 310], essentially by following the lines in the proof of Theorem Q] in Appendix [B] 
and noticing that, this time, because of the additional nonnegativity constraint, one more functional needs to be 
preserved in bounding the cardinality of V, 

W, X 2 ; Y) - I^V, X 2 ; S) = H fl (Y) - H fl (S) + H f ,(X 2 , S\V) - H f ,(X 2 , Y\V). (D-13) 

This concludes the proof of CorollaryQ] 

E. Proof of Theorem^ 

We prove that for any (M c ,Mi, n, e) code consisting of a mapping (pi : W c xWiXS" — > T 1 ^ at Encoder 1, a 
sequence of mappings (p 2 j : "W^xS' -1 — > X 2 , i = 1, . . . ,n, at Encoder 2, and a mapping ip : u " — > W c xWi at the 
decoder with average error probability P" — » as n — » and rates = n -1 log 2 M c and Ri = n~ l log 2 Mi, the rate 
pair (R c , R\) must satisfy l |23t . 

Fix n and consider a given code of block length n. The joint probability mass function on W c x'WiXS"xX"xX 2 'xy" 
is given by 

n 

P(W C , W\ t S", %\, 4, f) = P{W C , W X ) Yl P{Si)P(x^W c , Wy, s")P(x 2l \lV c , S l - l )P{xj % \x lu X 2u St), (E-l) 

i=l 

where, P{xy\zv c , W\,s n ) is equal 1 if Xy = f\{w c , W\, s") and otherwise; and P(x 2 j\w c , s ,_1 ) is equal 1 if x 2 i = f 2 {w c , s i_1 ) 
and otherwise. 

The proof of the bound on R\ follows trivially by revealing the state S" to the decoder. 

The proof of the bound on the sum rate R c + R\ is as follows. The decoder map ip recovers (W c , Wi) from Y" with 
vanishing average error probability. By Fano's inequality, we have 

H(W c ,Wi|Y") < tie,,, (E-2) 

where e n — > as P" — > 0. 

n(R c +R 1 )=H(W c ,W 1 ) 

= I(W C , Wi; Y") + H{W C , Wi| Y") 
< I(W c ,Wi;Y") + ne n 
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= I(W C , Wi, S"; Y") - I(S"; Y"\W C , W t ) + ne n 

n 

= ( Yj KWc, Wi, S"; Y;|Y i_1 )) - H(S"\W C , Wi) + H(S n \W c , Wi, Y") + ne„ 

n 

= Yj HiY^Y'- 1 ) - H(Yi\W c , Wi, S", Y*' 1 ) - H(S,) + H(S,|W C/ W u Y", S i_1 ) + ne„ 

< Y H ( y d - H(Yi\Xi,i, X 2 ,i, Si) - H(Si) + H(Si\W c , W h Y", S*" 1 , X 2 ,,) + ne„ 
i=i 

< £ I(X W , X 2 ,„ S i} Yd - H(S,) + H(Si\X 2ii , Y{) + ne n 
i=i 

u 

= £ J(Xi (i , X 2 ,„ S,; Yi) - J(S,; X 2 ,„ Y,) + ne„ 
i=i 

= Y I(X h „ X 2/i ; Yi\Si) - I(Sr, X 2fi \Yi) + ne n , (E-3) 

i=i 

where: (a) follows from Fano's inequality; (b) follows from the fact that the state S" is i.i.d. and is independent 
of the messages; (c) follows from (W c , Wi, S", Y' -1 ) <-» (Xi ,, X 2i „S,) «-> Y„ and the fact that X 2j( is a deterministic 
function of (W c , S !_1 ); and (d) follows from the fact that conditioning reduces entropy. 
Finally we obtain the desired bound from jE-3l l by standard single-letterization [42]. 

F. Proof of Corollary \2\ 

Relaxing the constraint on R\ in Theorem[TJ we obtain 

C = max/(U, V, X 2 ; Y) - I(U, V, X 2 ; S) (F-l) 
where the maximization is over joint measures Ps,u,i/,Xi,x 2 ,y of the form 

Ps,u,v,x u x 2 ,y = QsPxzPvis^Pu.x^s.v.Xj- (F-2) 

The corollary then follows by substituting K = V), and noticing that the distribution on (S, K, Xi, X 2 , Y) is given 
by 

Ps,K,x lt x 2 ,Y = Ps,U,V,Xi,X 2 ,Y (F-3) 
= QsPx 2 Pv\s,x 2 Pu,x 1 \s,v,X2 (F-4) 
= QsPxjPu.vis.XiPx^SMXXt (F-5) 
= Qsf x 2 ^K|s,x 2 fx, \s,k,x 2 ■ (F-6) 

G. Proof of Theorem® 

1) Direct Part: The achievability follows by ignoring the strictly causal part of the state at Encoder 2, and 
using the generalized dirty paper coding scheme of [5, Theorem 7]. 
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2) Converse Part: For the converse part, we use the outer bound of Theorem [3] for the discrete MAC which 
can be readily extended to memoryless channels with discrete time and continuous alphabets using standard 
techniques [46]. Then, we obtain an outer bound on the capacity region of the Gaussian MAC in terms of the 
closure of the convex hull of the set of rate pairs (R c , R\) satisfying 

Ri < I(X 1 ;Y\S,X 2 ), 
Rc + Ri < I(X 1/ X 2 ;Y\S)-I(X 2 ;S\Y), (G-l) 

for some probability distribution of the form Ps,x lr x 2 ,Y = QsPx 2 Px 1 \X2,sWy\x 1 ,x 2 ,s such that E[X^] < Pi andE[X|] < P 2 . 
The rest of the converse proof follows by reasoning and using algebra similar to in the proofs of [5, Theorem 7] 
and [11, Theorem 4], and is omitted for brevity. 



References 

[1] E. Biglieri, J. Proakis, and S. Shamai (Shitz), "Fading channels: Information-theoretic and communication 

aspects," IEEE Trans. Inf. Theory, vol. 44, pp. 2619-2692, Oct. 1998. 
[2] C. E. Shannon, "Channels with side information at the transmitter," IBM journal of Research and Development, 

vol. 2, pp. 289-293, Oct. 1958. 
[3] S. I. Gel'fand and M. S. Pinsker, "Coding for channel with random parameters," Problems of Control and 

Information Theory, vol. 9, pp. 19-31, 1980. 
[4] G. Keshet, Y. Steinberg, and N. Merhav, "Channel coding in the presence of side information: subject review," 

Foundations and Trends in Communications and Information Theory, 2008. 
[5] A. Somekh-Baruch, S. Shamai (Shitz), and S. Verdu, "Cooperative multiple access encoding with states available 

at one transmitter," IEEE Trans. Inf. Theory, vol. 54, pp. 4448-4469, Oct. 2008. 
[6] S. Kotagiri and J. N. Laneman, "Multiaccess channels with state known to some encoders and inde- 
pendent messages," EURASIP Journal on Wireless Commnunications and Networking, vol. Article ID 450680. 

doi:10.1155/2008/450680, 2008. 
[7] A. Zaidi, S. Kotagiri, J. N. Laneman, and L. Vandendorpe, "Multiaccess channels with state known to one 

encoder: Another case of degraded message sets," in Proc. IEEE Int. Symp. Information Theory, Seoul, Korea, 

Jun.-Jul. 2009, pp. 2376-2380. 
[8] A. Khisti, U. Erez, A. Lapidoth, and G. Wornell, "Carbon copying onto dirty paper," 7EEE Trans. Inf. Theory, 

vol. 53, pp. 1814-1827, May 2007. 
[9] T. Philosoph, A. Khisti, U. Erez, and R. Zamir, "Lattice strategies for the dirty multiple access channel," in Proc. 

IEEE Int. Symp. Information Theory, Nice, France, Jun. 2007, pp. 386-390. 
[10] A. Zaidi, S. Kotagiri, J. N. Laneman, and L. Vandendorpe, "Cooperative relaying with state at the relay," in 

Proc. IEEE Information Theory Workshop, Porto, Portugal, May 2008, pp. 139-143. 
[11] , "Cooperative relaying with state available non-causally at the relay," IEEE Trans. Inf. Theory, vol. 56, pp. 

2272-2298, May 2010. 

[12] A. Zaidi and L. Vandendorpe, "Lower bounds on the capacity of the relay channel with states at the source," 
EURASIP Journal on Wireless Commnunications and Networking, vol. Article ID 634296. doi:10.1155/2009/634296, 
2009. 



January 17, 2012 



DRAFT 



37 



[13] Y. Cemal and Y. Steinberg, "The multiple-access channel with partial state information at the encoders," IEEE 

Trans. Inf. Theory, vol. IT-51, pp. 3992-4003, Nov. 2005. 
[14] Y. Steinberg, "Coding for the degraded broadcast channel with random parameters, with causal and noncausal 

side information," IEEE Trans. Inf. Theory, vol. IT-51, pp. 2867-2877, Aug. 2005. 
[15] A. Lapidoth and Y. Steinberg, "The multiple access channel with causal and strictly causal side information at 

the encoders," in Proc. Int. Zurich Seminar on Communications (IZS), Zurich, Switzerland, Mar. 2010, pp. 13-16. 
[16] , "The multiple access channel with two independent states each known causally at one encoder," in Proc. 

IEEE Int. Symp. Information Theory, Austin, TX, USA, Jun. 2010, pp. 480-484. 
[17] H. Permuter, S. Shamai (Shitz), and A. Somekh-Baruch, "Message and state cooperation in multiple access 

channels," IEEE Trans. Inf. Theory, vol. 57, pp. 6379-6396, Oct. 2011. 
[18] M. Li, O. Simeone, and A. Yener, "Multiple access channels with states causally known at transmitters," Submitted 

for publication in IEEE Trans. Inf. Theory. Available at http://arxiv.org/abs/1011.6639 2010. 
[19] , "Message and state cooperation in a relay channel when only the relay knows the state," Submitted for 

publication in IEEE Trans. Inf. Theory. Available at http://arxiv.org/abs/1102.0768 . 2011. 
[20] B. Akhbari, M. Mirmohseni, and M. R. Aref, "Compress-and-forward strategy for the relay channel with non- 
causal state information," in Proc. IEEE Int. Symp. Information Theory, Seoul, Korea, Jun.-Jul. 2009, pp. 1169-1173. 
[21] M. N. Khormuji and M. Skoglund, "On cooperative downlink transmission with frequency reuse," in Proc. IEEE 

Int. Symp. Information Theory, Seoul, Korea, Jun.-Jul. 2009, pp. 849-853. 
[22] G. Como and S. Yiiksel, "On the capacity of memoryless finite state multiple-access channels with asym- 
metric state information at the encoders," Accepted for publication in IEEE Trans. Inf. Theory. Available at 

http:llarxiv.org/absll 011. 1012. 1912\ 2011. 
[23] N. §en, G. Como, S. Yiiksel, and F. Alajaji, "On the capacity of memoryless finite-state multiple access channels 

with asymmetric noisy state information at the encoders," in Proc. IEEE Int. Symp. Information Theory 2011, 

submitted for publication. Available at http://arxiv.org/abs/1103.3054 2011. 
[24] R. Khosravi-Farsani and F. Marvasti, "Capacity bounds for multiuser channels with non-causal channel state 

information at the transmitters," in Proc. IEEE Int. Symp. Information Theory 2011, submitted for publication. Available 

at \http://arxiv.org/abs/1102.3410\ 2011. 
[25] S. I. Bross and A. Lapidoth, "The state-dependent multiple-access channel with states available at a cribbing 

encoder," in Proc. of IEEE 26-th Convention of Electrical and Electronics Engineers in Israel, Israel, 2010. 
[26] S. Jafar, "Capacity with causal and noncausal side-information: A unified view," IEEE Trans. Inf. Theory, vol. 52, 

pp. 5468-5474, Dec. 2006. 

[27] S. Sigurjonsson and Y. H. Kim, "On multiple user channels with state information at the transmitters," in Proc. 

IEEE Int. Symp. Information Theory, Sep. 2005. 
[28] C. E. Shannon, "The zero error capacity of a noisy channel," IRE Trans, on Inf. Theory, vol. 2, pp. 8-19, 1956. 
[29] G. Dueck, "Partial feedback for two-way and broadcast channels," Inf. Contr., vol. 46, pp. 1-15, 1980. 
[30] S. H. Lim, Y.-H. Kim, A. E. Gamal, and S.-Y. Chung, "Noisy network coding," IEEE Trans. Inf. Theory, vol. 57, 

pp. 3132-3152, May 2011. 

[31] A. D. Wyner and J. Ziv, "The rate-distortion function for source coding with side information at the decoder," 
IEEE Trans. Inf. Theory, vol. 22, pp. 1-10, Jan. 1976. 



January 17, 2012 



DRAFT 



38 



[32] A. Lapidoth and Y. Steinberg, "A note on multiple access channels with strictly causal state information," in 

available at \http:// arxiv.org/abs/1106. 0380 \ Jun. 2011. 
[33] A. Zaidi, P. Piantanida, and S. Shamai (Shitz), "Multiple access channel with states known noncausally at one 

encoder and only strictly causally at the other encoder," in Proc. IEEE Int. Symp. Information Theory, submitted 

for publication, 2011. 

[34] X. Wu and L.-L. Xie, "On the optimal compressions in the compress-and-forward relay schemes," IEEE Trans. 



Inf. Theory, submitted for publication. Available http://arxiv.org/abs/1009.5959 Feb. 2011. 
[35] P. Zhong, A. A. Haija, and M. Vu, "On compress-and-forward without wyner-ziv binning for relay networks," 

IEEE Trans. Inf. Theory, submitted for publication. Available http://arxiv.org/abs/llll.2837 . Nov. 2011. 
[36] G. Kramer and J. Hou, "On message lengths for noisy network coding," in Proc. IEEE Information Theory Workshop, 

Praty, Brasil, Oct. 2011. 

[37] T. M. Cover and A. El Gamal, "Capacity theorems for the relay channel," IEEE Trans. Inf. Theory, vol. IT-25, pp. 
572-584, Sep. 1979. 

[38] M. Katz and S. Shamai (Shitz), "Cooperative schemes for a source and an occasional nearby relay in wireless 

networks," IEEE Trans. Inf. Theory, vol. 55, pp. 5139-5160, Nov. 2009. 
[39] S. S. Pradhan, J. Chou, and K. Ramchandran, "Duality between source coding and channel coding and its 

extension to the side information case," IEEE Trans. Inf. Theory, vol. IT-49, pp. 1181-1203, May 2003. 
[40] M. H. M. Costa, "Writing on dirty paper," IEEE Trans. Inf. Theory, vol. 29, pp. 439-441, May 1983. 
[41] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: John Willey & Sons INC., 1991. 
[42] I. Csiszar and J. Korner, Information Theory: Coding Theorems for Discrete Memoryless Systems. London, U. K.: 

Academic Press, 1981. 

[43] A. E. Gamal and Y.-H. Kim, Lecture Notes on Network Information Theory. Available at 

|http://arxiv.org/abs/1001.3404| 2010 [on line]. 
[44] I. Csiszar and J. Korner, "Broadcast channels with confidential messages," 7EEE Trans. Inf. Theory, vol. 24, pp. 

339-348, 1978. 

[45] F. M. J. Willems, lnformationtheoretical Results for the Discrete Memoryless Multiple Access Channel. Leuven, 

Belgium: Doctor in de Wetenschappen Proefschrift dissertation, Oct. 1982. 
[46] R. G. Gallager, Information Theory and Reliable Communication. New York: John Willey, 1968. 



January 17, 2012 



DRAFT 



